Do we remember our python?
--------------------------

Remembering python and good programming practices

* docstrings
* variable names


Do we know how github works?
----------------------------------------------

Git-based workflow:
* Make your own copy in git: git fork
* Get the code to work on: git clone
* Loop in like 30 minute intervals:
    * Work, work (not too long!)
    * Save your changes: git commit
    * Put your changes at github: git push origin main

Git-based access control:
* SSH keys (preferred!)
  * open a terminal in vscode
  * type ssh-keygen
  * type cat ..\..\.ssh\id_rsa.pub
  * copy what you see 
  * in a browser go to https://github.com/settings/ssh/new
  * paste it in and name it "my computer" or similar
* Tokens

(See also: https://docs.github.com/en/authentication/connecting-to-github-with-ssh; https://code.visualstudio.com/docs/editor/versioncontrol)


Looking at Data with numpy
----------------------------------------

Do *not* use this for your project (yet!)

In [None]:
import numpy
import seaborn

In [None]:
# Let's load the iris data

iris = seaborn.load_dataset('iris')

iris

In [None]:
help(iris)

What are some types of data?
----------------------------

| Type | Example Values | Example Application | 
|----|-------------|--------------|
| Quantitative Continuous | sepal_length |  |
| Quantitative Discrete | number in a bunch |  |
| Qualitative Nominal | colors |  |
| Qualitative Ordinal | species |   |

What are the types of the columns in the iris dataset?

What about these forms of data?
* Text
* Dates/times
* Images

In [None]:
# Let's get some summary statistics for this data

iris.agg(['min', 'mean', 'median', 'max', 'var'])

In [None]:
# What about features / variables that are not numeric?

iris['species'].unique()

Summary statistics can mislead
------------------------------

The example below comes from this great seaborn documentation: https://seaborn.pydata.org/examples/anscombes_quartet.html

In [None]:
seaborn.set_theme(style="ticks")

# Load the example dataset for Anscombe's quartet
df = seaborn.load_dataset("anscombe")

df

In [None]:
# what if we grouped the iris data by species and then got summary statistics for it?
 
df.groupby('dataset').agg(['min', 'mean', 'median', 'max', 'var'])

In [None]:
# Show the results of a linear regression within each dataset
seaborn.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
           col_wrap=2, ci=None, palette="colorblind", height=4,
           scatter_kws={"s": 50, "alpha": 1})

What... just happened?

Let's explore data visualization
----------------------

In [None]:
iris = seaborn.load_dataset('iris')
 
seaborn.pairplot(data=iris, x_vars=["petal_width"], y_vars=["petal_length"], height=5)

In [None]:
help(seaborn.pairplot)

What are some types of visualization?
----------------------------------------------------

Hint: What happens if you go to https://seaborn.pydata.org/examples/index.html or type help(seaborn)?

In [None]:
# Let's explore seaborn
help(seaborn.)

Lying with visualizations
-------------------------

Hint: https://uxdesign.cc/a-beginners-guide-to-identifying-misleading-data-visualizations-d82a93211ac6

Later on we will learn about weights and biases

Being a good visualization creator
----------------------------------

* https://uxdesign.cc/how-to-design-data-visualizations-that-are-actually-valuable-e8b752835b9a
* https://www.tableau.com/about/blog/examining-data-viz-rules-dont-use-red-green-together
* https://seaborn.pydata.org/tutorial/color_palettes.html