environment setup:

pip install:

    notebook
    otter-grader
    datascience
    scipy
    pandas
    matplotlib
    ipywidgets

    optional:

        jupyterlab
install:

    OpenSSL 1.1.1 or higher

In [1]:
from datascience import *
from math import *
import numpy as np

import d8error

>References:

https://inferentialthinking.com/chapters/07/Visualization.html

http://www.data8.org/sp22/python-reference.html

# Tables: 

>Table Operations:

<> table.show(n)

    shows the first n rows of table
    !!not creating or mutating any table

<> table.apply(function, col0, (col1), (col2)...)

    returns an npArray; works similar to map

    - table.apply(function)
        acts on all rows

>>Create new tables out of existing tables:

<> table.select(columns)

    returns a new table out of selected columns
    order of columns of the new table is manageable
    columns could be either column labels or column indices

<> table.drop(columns)

    inverse version of .select(
    creates a new table without selected columns

<> table.where(column, condition)

    returns a new table with rows meeting the condition
    example: cones.where('Flavor', 'chocolate')

    condition can be a function!
    example: imdb.where('Year', lambda x: x > 2000)

some of the CONDITIOINS:

|Predicate|Example|Result|
|-|-|-|
|`are.equal_to`|`are.equal_to(50)`|Find rows with values equal to 50|
|`are.not_equal_to`|`are.not_equal_to(50)`|Find rows with values not equal to 50|
|`are.above`|`are.above(50)`|Find rows with values above (and not equal to) 50|
|`are.above_or_equal_to`|`are.above_or_equal_to(50)`|Find rows with values above 50 or equal to 50|
|`are.below`|`are.below(50)`|Find rows with values below 50|
|`are.between`|`are.between(2, 10)`|Find rows with values above or equal to 2 and below 10|
|`are.between_or_equal_to`|`are.between_or_equal_to(2, 10)`|Find rows with values above or equal to 2 and below or equal to 10|

<> table.sort(column, descending = True/False)

    returns a new table with order
    descending is optional

<> table.take(row_indices)

    returns a new table containing selected rows
    example: table.take(0, 3, 1)
    
    also: 
    table.take(np.arange(...))

<> table.exclude(row_indices)

    return a new table without selected rows

<> table.column(column)

    returns an array of selected column
    args could be either column label or index

<> table.relabeled(column, new_label)

    returns a new table with column label changed

>>> Methods related to visulization

<> table.bin(column, bins = some_random_equal_bins)

    similar to group(see below), but groups by bins

<> table.join(column_for_joining, table2, table2_col_for_joining)

<> table.group(column, function = len)

demo:
https://www.youtube.com/watch?v=HLoYTCUP0fc&t=16s

    If function not specified, group only returns a table with selected column and an augmented column of the row counts of categories;
    Otherwise, all other columns could be inherited with columns processed by that function.

    Additional to that, <column> could be a list of columns, by which we do "cross-classifying" (or "multi-classifying"?), one row for each combination;
    It's notable that after grouping, the rest columns' names will be a bit different. (exp: 'Rating' --> 'Rating average')

<> table.pivot(col1, col2, values = column_name, function = len)

demo:
https://www.youtube.com/watch?v=4WzXo8eKLAg&t=15s

    Another idea in cross-classifying: .group uses two cols to repersent classifiers, while .pivot uses the grid approach--the first row and the first col serves as classifiers!
    After this, col1 beecomes the row and col2 becomes the column.

    <values> specifies the which col's values to aggragate by; <function> is the same as it is in .group

>Table Properties:

<> table.num_rows
    
    And:
    
    table.num_columns

<> table.labels

    returns a list containing all labels of the columns

>Create A Table:

<> Table.read_table('path')

<> with_column and with_columns methods

In [None]:
t = Table()

In [None]:
streets = make_array('Bancroft', 'Durant', 'Channing', 'Haste')
southside = t.with_column('Street name', streets)
southside

In [None]:
# with_column doesn't mutate the table
t

In [None]:
southside.with_column('City', 'Berkeley')

In [None]:
t.with_columns(
    'Street name', streets,
    'Blocks from campus', np.arange(4),
    'Time to get there', np.arange(1, 8, 2)
    )

<> with_row / with_rows(list)

> Others:

`table.sample(n

# Numpy Arrays:

an array is a list of values of the same type

## Creating an array:

In [2]:
# make an array:
first_four = make_array(1, 2, 3, 4)
first_four

array([1, 2, 3, 4], dtype=int64)

In [None]:
type(first_four)

make an array using RANGES:

`np.arange((start), end, (step))`

In [None]:
np.arange(6)

In [None]:
np.arange(1, 11, 2)

In [None]:
np.arange(0, 1, 0.1)

## Array Operations: 

In [None]:
np.average(first_four)

In [None]:
np.sum(first_four)

In [None]:
# builtin fuctions works in many times
sum(first_four)

In [None]:
first_four.item(0)

In [None]:
# list operations
first_four[0]

In [3]:
# note that the data types are different
type(first_four.item(0)), type(first_four[0])

(int, numpy.int64)

In [None]:
[n + 1 for n in first_four]

In [None]:
len(first_four)

`np.append(npArr, item)`

is to append `item` to the end of `npArr`.

_**This does not mutate `np.array` !!**_ You might want to do assignment.

`np.random.choice(npArr, n = 1)`

The default 1 is to randomly return an item of `npArr`. Otherwise it returns an array of random choices at `n` times.

>Arithmetic Operations: 

In [None]:
next_four = make_array(5, 6, 7, 8)
next_four

In [None]:
first_four + next_four

In [None]:
first_four * next_four

In [None]:
first_four * 4

In [None]:
first_four + 4 == next_four

> Boolean operation

`npArr = ...` gives an _array_ of `True`s and `False`s.

Bools are actually 1 and 0 in python. For example, operation `np.sum(npArr == ...)` will return the number of how many items in `npArr` are `...`.

# Visulizations:

Line Plot>>>

    table.plot(column_for_x_axis, column(s)_for_y_axis = all_other_cols)


Bar Chart>>>

    table.barh(column_name_of_categories, values = column_name_to_aggragate)

Histogram>>>

    table.hist(column_for_x_axis, unit = 'Unit', bins = random_equal_bins)

"bin" is created using npArray

Scatter Plot>>>

    table.scatter(column_for_x_axis)

# Others:

_ = interact(function_name, function_inputs_in_npArray)

    creates an interactive mode of a function

type function_name? (adding the question mark) to see the docstring
    
    do the same by Shift + Tab after clicking on a name