# Let's explore data visualization

I will use a package called *pandas* and a python visualization package called *seaborn*. Seaborn (and *plotly*, which I may also use in the future) are both based on *matplotlib*.

In your project, you may **not** use pandas. But you may use matplotlib.

In [None]:
import pandas as pd
import seaborn

carsdf = pd.read_csv('data/vehiclesNumeric.csv')

seaborn.pairplot(data=carsdf, x_vars=["price"], y_vars=["year"], height=5).set(title="Craigslist used car prices against car year of manufacture")

* What is a pairplot?
* What are some other types of visualization? (Hint: What happens if you go to https://seaborn.pydata.org/examples/index.html or type help(seaborn)?)

It's quite easy to make misleading visualizations! See: https://uxdesign.cc/a-beginners-guide-to-identifying-misleading-data-visualizations-d82a93211ac6

Be a good visualization creator!
* use the right type of visualization (hint: pie charts are very rarely the right type of visualization!)
* number and label your axes
* provide an informative caption / label
* unless there's a really good reason, start all axes at 0
* use color schemes that work for people who are color blind



Other resources:
* https://uxdesign.cc/how-to-design-data-visualizations-that-are-actually-valuable-e8b752835b9a
* https://www.tableau.com/about/blog/examining-data-viz-rules-dont-use-red-green-together
* https://seaborn.pydata.org/tutorial/color_palettes.html

# Introduction to numpy: Making and looking at numpy arrays

https://numpy.org/


https://numpy.org/doc/stable/ <-- look here!!

- the computer representation of linear alg


In [None]:
import numpy as np

## Data Types in numpy

| Programming name | Math name |
|------|------|
| ?d array | vector |
| ?d array | matrix |
| ?d array | tensor |

GPUs and TPUs are set up to operate on vectors and matrices  

*What is a GPU and what is a TPU?*


## Let's make vectors, matrices and tensors using numpy

In [None]:
# let's amke a vector
v = np.array([1, 2, 3])
print (v)

In [None]:
# let's make a matrix
m = np.array([[1,2,3], [4,5,6]])
print(m)

In [None]:
# let's make a tensor
t = np.array([[[1,2,3,4], [5,6,7,8]], [[9,10,11,12], [13,14,15,16]],  [[17,18,19,20], [21,22,23,24]]])
print(t)

## Getting information about numpy arrays

### The type of a numpy array

All the elements in a numpy array must have the same type (so you can't mix integers, floats, strings...).

In [None]:
# how do I figure out the type of a numpy array? 
print("v type", v.dtype)

In [None]:
# how do I change the type of a numpy array?
vFloat = v.astype(float)
print("vFloat type", vFloat.dtype)
print("vFloat\n", vFloat)

vString = np.array(vFloat, dtype=str)
print("vString type", vString.dtype)
#make new array, then tell them to be strings
#coerce into strings for new numpy array
print("vString\n", vString)

vFloat = vString.astype(float)
print("vFloat type", vFloat.dtype)
#try to coerce strings into floats
print("vFloat\n", vFloat)

In [None]:
# now you make it into an int array and print
# going from float to int will truncate, so be careful with this

### Getting information about the size of a numpy array

In [None]:
# let's take a look at ndim, size, shape of v
print(v.ndim)
print(v.size) 
print(v.shape)
print(v.dtype)

In [None]:
# let's take a look at ndim, size, shape of m
print(m.ndim)
print(m.size)
print(m.shape)
print(m.dtype)

In [None]:
# let's take a look at ndim, size, shape of t
print(t.ndim)
print(t.size)
print(t.shape) 
print(t.dtype)

In [None]:
# let's save and load a numpy array
import pickle as pkl

# save to a pickle file
with open('data.pkl','wb') as f:
    pkl.dump(nparray, f)

# load from a pickle file
with open('data.pkl','rb') as f:
    nparray2 = pkl.load(f)
    print(nparray2)

## From python lists to numpy arrays

In [None]:
# let's make a numpy array from a python list
pylist = [[10, 1, 2021], [2, 7, 2023]] 
nparray = np.array(pylist)
print("python list\n", pylist)
print("numpy array\n", nparray)
print("shape\n", nparray.shape)

In [None]:
# let's convert a numpy array to a python list
backtolist = nparray.tolist()
print("back to a python list\n", backtolist)

In [None]:
# here's something we CANNOT do
pylist = [[10, 1, 2021], [2, 7]] 
nparray = np.array(pylist)

*What is "inhomogeneous"?*

*What must be true of a list we want to make into a numpy array?*

## Making numpy arrays without listing all the elements

### Zeros

In [None]:
# make an array of zeros
nparrayZero = np.zeros([3, 10]) #3 by 10 nparray of zeros
print(nparrayZero)

In [None]:
# that's floats ... what if we want ints?
nparrayZero.astype(int) 
print(nparrayZero) #didn't work

In [None]:
nparrayZero = nparrayZero.astype(int) 
print(nparrayZero) #works

In [None]:
# another way to get ints
nparrayZeroInt = np.array(nparrayZero, dtype=int)
print(nparrayZeroInt)

### Ones

In [None]:
# what if we want ones instead of zeros?

nparrayOnes = np.ones([3,10])
print(nparrayOnes)

### More than ones

In [None]:
# what if we want sevens?
nparraySevens = ??
print(nparraySevens)

### Random values

In [None]:
# make an array of random values
nparrayRandomFloat = np.random.random([3, 10])
print(nparrayRandomFloat)
print(nparrayRandomFloat.dtype)

In [None]:
# what if we want random ints? let's see...
nparrayRandomInt = np.random.random([3, 10], dtype=int) # can't do this because it generates random numbers between 0 and 1
print(nparrayRandomInt)

In [None]:
# hmm, if not that then what?
print(((nparrayRandomFloat)*10).astype(int))

In [None]:
# another way; let's take a look at the arguments here
nparrayRandomInt = np.random.randint(1, 10, [3,7])
print(nparrayRandomInt)

In [None]:
# yet another way!
nplinArray = np.linspace(0,10,10, dtype=int).reshape(2,5) #ints but not random
print(nplinArray)

In [None]:
# now you try: what if we want 9 floats (not random) between .2 and .6, as a 3 x 3 numpy array?

# what if we want to shape that into a 2 by 5 array?
nplinArray = np.linspace(??).reshape(??)
print(nplinArray)