The contents of this course including lectures, labs, homework assignments, and exams have all been adapted from the [Data 8 course at University California Berkley](https://data.berkeley.edu/education/courses/data-8). Through their generosity and passion for undergraduate education, the Data 8 community at Berkley has opened their content and expertise for other universities to adapt in the name of undergraduate education.

# Chapter 5: Sequences

## Arrays
- An array contains a sequence of values
- Arrays can be strings, or numbers, and a mix, but the should usally be of the same type of data
- Arithmetic is applied to each element individually
- Array lengths much match for arithmetic bewteen arrays
- A column of a Table is an array
- fucntion make_array(values)

### Let's run some code

In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

In [None]:
my_array = make_array(1, 2, 3, 4)

In [None]:
my_array

In [None]:
my_array * 2

In [None]:
my_array ** 2

In [None]:
my_array + 1

In [None]:
my_array # array is unchanged

In [None]:
len(my_array)

In [None]:
sum(my_array)

In [None]:
len(my_array) / sum(my_array)

In [None]:
len(my_array) / sum(my_array).mean()

In [None]:
another = make_array(60, 70, 80, 90)

In [None]:
my_array + another

In [None]:
yet_another = make_array(5, 6, 7)

In [None]:
my_array + yet_another

In [None]:
tunas = make_array('bluefin', 'albacore', 'jim')
tunas

In [None]:
make_array('red fish', 'blue fish', 1, 2)

## Columns of Tables are Arrays ##

In [None]:
skyscrapers = Table.read_table('skyscrapers.csv')
sf = skyscrapers.where('city', 'San Francisco')
sf

In [None]:
sf.select('height')

In [None]:
sf.column('height')

In [None]:
sf.column('height').mean()

In [None]:
den = skyscrapers.where('city', 'Denver')

In [None]:
sf.column('height').mean() - den.column('height').mean()

## Numpy
- Numpy package usually abreviated as ***np***
- Nummpy is a powerful package used to manipulate arrays
- [Numpy Documentation](https://numpy.org/doc/stable/reference/)
- [Text book numpy function cheatsheet](https://www.inferentialthinking.com/chapters/05/1/Arrays.html#Functions-on-Arrays)

## Ranges
- A range is an array of consecutive numbers
- np.arange(end) == an array of increasing integers from 0 to end
- np.arange(start, end) == an array of increasing integers from start to end
- np.arange(start, end, step) == an array of increasing integers from start to end with step between consecutive numbers
- np.arange always includes start and exludes end



In [None]:
np.arange(100)

In [None]:
np.arange(50, 100)

In [None]:
np.arange(2, 9, 2)

In [None]:
np.arange(1.5, -2, -0.5)

Let's calculate the [Harmonic Series](https://en.wikipedia.org/wiki/Harmonic_series_(mathematics)) to 10^9

In [None]:
value = int(10**9)
ones = np.ones(value, dtype='int64')
denom = np.arange(1, (value+1))
(ones/denom).sum()

# Chapter 6: Tables
- There are too many Table functions to list here.  
- [Table documentation](http://data8.org/datascience/tables.html)
- Here are some functions to get us started
    - Table.read_table(file_name) to load .csv file as a Table
    - Table.with_columns('Label_1', Values_1, 'Label_2', Values_2, etc.)) to create Table
    - Table.columns('Label') to get array of values from Label index
    - .sum('Label'), .min('Label'), .max('Label') of values from Label index
    - .drop('Label') to drop Label from Table

## Let's create a table from scratch ##

In [None]:
streets = make_array('Bancroft', 'Durant', 'Channing', 'Haste')
streets

In [None]:
Table()

In [None]:
southside = Table().with_column('Streets', streets)
southside

In [None]:
southside.with_column('Blocks from campus', np.arange(4))

In [None]:
southside

In [None]:
southside = southside.with_column('Blocks from campus', np.arange(4))
southside

In [None]:
southside.labels

In [None]:
southside.num_columns

In [None]:
southside.num_rows

## W.E.B. du Bois demo
 - W.E.B. du Bois 1868 - 1963
 - Scholar, historian, activist, data scientist
 
 ### Income and Expediture of 150 Black Famalies in Alanta GA
 <img src=du_bois_chart.jpg style="width: 500px;"/>

In [None]:
du_bois = Table.read_table('du_bois.csv')
du_bois

In [None]:
du_bois.select('STATUS')

In [None]:
du_bois.column('STATUS')

In [None]:
du_bois.column('ACTUAL AVERAGE')

In [None]:
du_bois.column('FOOD')

In [None]:
du_bois.column('ACTUAL AVERAGE') * du_bois.column('FOOD')

In [None]:
food_dollars = du_bois.column('ACTUAL AVERAGE') * du_bois.column('FOOD')
du_bois = du_bois.with_column(
    'Food $',
    food_dollars
)
du_bois

In [None]:
du_bois.set_format('FOOD', PercentFormatter)

In [None]:
du_bois.select('CLASS', 'ACTUAL AVERAGE', 'FOOD', 'Food $')

In [None]:
du_bois.column('FOOD')

In [None]:
# Which group ("CLASS") spent the highest percentage on rent?
du_bois.with_column('Percentage on Rent', du_bois.column('RENT')*100).sort('Percentage on Rent', descending=True)

## Selecting data in a column ##

In [None]:
movies = Table.read_table('movies_by_year_with_ticket_price.csv')
movies.show()

In [None]:
gross_in_dollars = movies.column('Total Gross') * 1e6
tix_sold = gross_in_dollars / movies.column('Average Ticket Price')

In [None]:
movies = movies.with_column('Tickets sold', tix_sold)

In [None]:
movies.show(4)

In [None]:
movies.set_format('Tickets sold', NumberFormatter)

In [None]:
movies.plot('Year', 'Tickets sold')

In [None]:
movies.where('Year', are.between(2000, 2005))

In [None]:
movies.where('Year', 2002)

In [None]:
movies.where('Year', are.equal_to(2002))

In [None]:
movies.where('#1 Movie', are.containing('Harry Potter'))

In [None]:
movies.take(np.arange(2, 5))