# Data Types

In [None]:
from datascience import *
from cs104 import *
import numpy as np

%matplotlib inline

## 1. Table Review: Art sales in the UK
[This data](https://github.com/thegetty/provenance-index-csv) comes from the [Getty Provenance Index](https://www.getty.edu/research/tools/provenance/), which currently contains more than 2.3 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books. 

<img src="https://upload.wikimedia.org/wikipedia/commons/d/d8/Sir_Anthony_van_Dyck_-_Portrait_of_Antoine_Triest%2C_Bishop_of_Ghent_%281576%E2%80%931655%29_-_BF.1977.2_-_Hermitage_Museum.jpg" width=300>

Sir Anthony van Dyck - Portrait of Antoine Triest

**Recall:** you can open the raw `.csv` files within Jupyter's file system. From inside Jupyter, locate the CSV file from the File Browser on the left-hand side of the window.  Double click to view as a formatted table, and right-click and select "Open With -> Editor" to view as an editable text file.

In [None]:
art = Table().read_table('data/UK_art_subset.csv')
art.show(5)

*Review*: Remove the unhelpful columns (e.g. `'auction_house'`) with the `drop` method, and save the result table in `no_house` variable.

In [None]:
no_house = art.drop('auction_house')
no_house.show(5)

In [None]:
art.sort('artist_name', descending=True).show(4)

Find non-painting objects using `Table.where(...)` and the predicate `are.not_equal_to()`.

In [None]:
not_a_painting = art.where('object_type', are.not_equal_to('Painting'))
not_a_painting

Now, a quick recap of `where` and the predicate tests to select rows.

In [None]:
art.where("auction_house", are.equal_to("Christie's"))

In [None]:
art.where("pounds", are.above(900))

In [None]:
art.where("lot_sale_year", are.between(1815, 1835))

In [None]:
art.where("title", are.containing('river'))

## What else can we learn from these data sets?

**Think-pair-share:** Display the most expensive items sold after 1850?

**Recall:** method chaining let's us combine multiple steps into a single line

In [None]:
art.where('lot_sale_year', are.above(1850)).sort('pounds', descending=True).show(5)

How much is £966 in 1859 in today's USD? 

The pound had an average inflation of 3.15% per year, meaning it is around £155,000 today.

In [None]:
pounds_2024 = 155000
dollars_2024 = pounds_2024 * 1.3
dollars_2024

## 2. Data Types

### Type

Can ask for the type of a value or variable with the built-in Python function `type`

In [None]:
type(3)

In [None]:
temperature = 98.6
type(temperature)

In [None]:
prof_name = "Steve"
type(prof_name)

In [None]:
this_class_is_fun = True
type(this_class_is_fun)

### Floats

Some decisions made from Python.  What type of value is produced by multiplying a float by an int?

In [None]:
answer = 0.75 * 2
answer
type(answer)

A computer cannot represent every real number exactly.  That would require infinite memory because some numbers have an infinite number of digits.

In [None]:
1 / 3

What happens when we run the next cell?

In [None]:
# 2 / 0

### Scientific Notation

Represent some numbers as $b \times 10^e$.

Examples:
* `1.23e5` is $1.23 \times 10^5$.
* `6.667e-07` is $6.67 \times 10^{-7}$.

In [None]:
2 / 3000

In [None]:
2 / 3000000

In [None]:
0.000000000000000123456789

In [None]:
0.000000000000000000000000000000000000000000000000000000000000000000000123456789

### Rounding Errors

Since numbers aren't always represented exactly, small errors may creap when we operated on floats.  Too small for us to worry about in this class.

In [None]:
0.6666666666666666 - 0.6666666666666666123456789 # a little less than 0 

In [None]:
2 ** 0.5

In [None]:
2 ** 0.5 * 2 ** 0.5 # should be 2.0 

In [None]:
2 ** 0.5 * 2 ** 0.5 - 2 # should be 0 

### Strings 

String values capture text data (sequences of characters).  Use single quotes or double quotes around strings.

In [None]:
'Painting'

In [None]:
"Painting"

### Variables vs Strings

In [None]:
print("painting") # String value

painting = 4      # variable named painting
print(painting)

Why both single and double quotes?

In [None]:
'Don't always use single quotes'

In [None]:
"Don't always use single quotes"

In [None]:
'cs' + '104' # concatenation

In [None]:
'cs' + ' ' +  '104' # spaces aren't added for you

### Conversions

Can only concatenate multiple *strings*.

In [None]:
number = 104
'cs' + number

*Convert* numbers to strings when you want to use them to build larger strings.

In [None]:
'cs' + str(number)

Can convert from string back to numbers as well.

In [None]:
int('3')

In [None]:
float('3.0')

In [None]:
int(str(number)) * 2

## 3. Arrays

Array:  sequence of values, all the same type, "boxed up"

Table operation: `column`

In [None]:
not_a_painting.column('pounds')

Arithmetic operations are **broadcast** on arrays. 

What's the price in dollars for each of these items? 

In [None]:
price_in_pounds = not_a_painting.column('pounds')
price_in_dollars = price_in_pounds * 1.3
price_in_dollars

Suppose the art auction house adds 5 pounds to each item's price. 

In [None]:
price_in_pounds + 5

In [None]:
fives = make_array(5,10,15,20)
fives

In [None]:
price_in_pounds + fives

We can call other built-in Python functions on these arrays as well. 

In [None]:
len(price_in_pounds)

In [None]:
max(price_in_pounds)

In [None]:
min(price_in_pounds)

In [None]:
sum(price_in_pounds)

In [None]:
np.mean(price_in_pounds)

In [None]:
price_in_pounds + make_array(1,2) #Error because not the same shapes 

In [None]:
price_in_pounds + make_array(1,2, 3, 4)

*Index* into array to retrieve items.  Indices start at 0.

In [None]:
price_in_pounds.item(0)

In [None]:
price_in_pounds.item(1)

In [None]:
price_in_pounds.item(3)

Think of `item(n)` as asking for the item that has `n` items before it.

The price of the most expensive piece of art sold:

In [None]:
top_price = art.sort('pounds', descending=True).column('pounds').item(0)
top_price