##### <H1>Intro to Python</H1>

Here are some extra resources for learning Python:

**Getting Started with Python**:

* https://www.codecademy.com/learn/python
* http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf
* http://docs.python-guide.org/en/latest/intro/learning/
* https://learnpythonthehardway.org/book/
* https://www.codementor.io/learn-python-online

**Learning Python in Notebooks**:

* http://mbakker7.github.io/exploratory_computing_with_python/

This is handy to always have available for reference:

**Python Reference**:

* https://docs.python.org/3.5/reference/


There are also Python courses in the MDST datacamp!

## 0. Jupyter Notebook

Welcome to Jupyter Notebook! Jupyter lets you develop documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

All our cells so far are _markdown_ cells, meaning they just contain text! You can enter edit mode on them by pressing ENTER

In [None]:
# Jupyter also has code cells, like this one! You can write and run a single line of code or more
# In python, the '#' symbol creates a comment line
# To run a cell, press CTRL-ENTER or SHIFT-ENTER (also moves down to the next cell)

In [1]:
# Below, we define the variable a to be the number 1
a = 1

Now, the variable we defined is accessible from any other code cell.

In [None]:
a

Some basics:

In [2]:
print('Hello World!')

Hello World!


In [3]:
# The notebook will execute every line in a cell, but will only print the last line
a = a + 1
a
a + 1

3

In [None]:
# You can get around this by calling print explicitly
print(a)
print(a + 1)

# 0. Imports

Python has tons of cool, prewritten libraries that implement code for you! All you have to do is import them

In [4]:
import antigravity

Most libraries won't automatically do anything -- they will just become a module which you can call

In [None]:
import time
time.time()

# be careful not to overwrite the library by creating a new variable of the same name...
# e.g. time = 2 will overwrite the library time and you won't be able to use its functions unless you import it again

You can import individual functions and not have to use the module name...

In [None]:
from random import random
random()

# if you just ran `import random`, you would have to run random.random()

or import all the functions within a library with *

In [5]:
from math import *
sqrt(3)

1.7320508075688772

You can also import your own files... we will see this in the first checkpoint!

## 1. Data Types

### 1.0 Dynamic Typing

Python does not require you to specify the data type when you create it -- it will automatically guess at assigning it

In [6]:
x = 1
type(x)

int

In other languages such as C++, java, etc., defining a variable looks more like:

In [7]:
int i = 3

SyntaxError: invalid syntax (2398013629.py, line 1)

This also means a variable's data type can change

In [8]:
x = str(x)
print(x)
type(x)

1


str

In [9]:
x = [1,2,3]
type(x)

list

### 1.1 Numbers

This is an integer...

In [10]:
1

1

In [11]:
type(0)

int

so is this

In [12]:
-3

-3

This is a float

In [None]:
2.0

In [None]:
type(0.0)

You can do math with them

In [None]:
3 + 2

In [None]:
1.1 - 9.0

In [None]:
1 * 5

In [None]:
# float division
1 / 2

In [None]:
# integer division
13 // 5

In [None]:
# modulus (remainder)
13 % 5

In [None]:
# exponent
4 ** 2

ints and floats are mostly interchangeable

In [None]:
3 * 3.0

In [None]:
9.8 // 2

but can also be cast (i.e. converted) to the other type

In [None]:
float(3)

In [None]:
int(2.9)

### 1.2 Strings

This is a string...

In [None]:
s = "apple"
s

so is this

In [None]:
'apple'

### 1.3 Boolean Values

In [None]:
True

In [None]:
False

### 1.4 Data Structures

#### 1.4.1 Lists

This is a list

In [None]:
l = [42, 7, 13, 24601, 2001, 3.50]

In [None]:
len(l)

In [None]:
r = [32, 'wse', True]

In [None]:
len(r)

#### _a sidenote on indexing_

Python uses 0-indexing

In [None]:
l[0]

In [None]:
l[5]

Negative numbers index from the end

In [None]:
l[-1]

You can index lists to get individual items...

In [None]:
l[3]

and strings to get individual characters (substring of length 1)

In [None]:
'string'[3]

You can use indexing to get subarrays/substrings

syntax: [start:end:step]

the subarray will include the start index but not the end

In [None]:
# the second and third elements
print(l[1:3])
print([l[1],l[2]])

In [None]:
# the last 8 characters
'this is a sentence'[-8:]

In [None]:
# every other element
l[::2]

#### back to lists

Elements can be added...

In [None]:
l.append(7)
l

removed (the first occurence)...

In [None]:
l.remove(7)
l

and modified

In [None]:
l[0] = 100
l

You can make lists like this:

In [13]:
list1 = list()
list1

[]

In [14]:
list2 = []
list2

[]

In [15]:
list3 = [10,30,20]
list3

[10, 30, 20]

In [16]:
# l4 uses the range function which generates a sequence of numbers
list4 = list(range(3))
list4

[0, 1, 2]

In [17]:
# l5 uses list comprehension -- it's like a one-line for loop
list5 = [i*2 for i in list4]
list5

[0, 2, 4]

##### 1.4.2 tuples

This is a python tuple

In [18]:
t = (1, 2, 3)
t

(1, 2, 3)

You can index them like lists

In [19]:
t[0]

1

but CAN'T modify them (they are immutable)

In [20]:
t[2] = 10

TypeError: 'tuple' object does not support item assignment

In [21]:
t.append(10)

AttributeError: 'tuple' object has no attribute 'append'

In most cases lists are more useful, but the immutable property of tuples means they can be used as dictionary keys, in sets, etc.

##### 1.4.3 dictionaries

Dictionaries store (key: value) pairs

In [22]:
d = {"apple": "a fruit", "basil": "an herb", "monkey": "a mammal"}
d

{'apple': 'a fruit', 'basil': 'an herb', 'monkey': 'a mammal'}

In [23]:
d.keys()

dict_keys(['apple', 'basil', 'monkey'])

In [24]:
d.values()

dict_values(['a fruit', 'an herb', 'a mammal'])

They are indexed by their keys and return the corresponding values

In [25]:
d['basil']

'an herb'

##### 1.4.4 sets

Sets store unique elements

In [26]:
s = set([1,2,3,1,2,3])
s

{1, 2, 3}

In [27]:
s.add(3)
s

{1, 2, 3}

In [28]:
s.add(12)
s

{1, 2, 3, 12}

In [29]:
s.update([3,4,5])
s

{1, 2, 3, 4, 5, 12}

## 2. Statements and loops

### 2.1 If statements

Conditional statements to handle different cases

In [30]:
if True:
    print('true')

true


In [31]:
if 1 == 3:
    print('1')
elif 2 == 3:
    print('2')
elif 3 == 3:
    print('3')
else:
    print('4')

3


### 2.2 For loops

Iterate through a group of items

In [32]:
for i in range(3):
    print(i)

0
1
2


In [33]:
# enumerate generates a sequence of index, value pairs
for i,x in enumerate(['a','b','c']):
    print(i,':',x)

0 : a
1 : b
2 : c


### 2.3 While loops

Whlie loops continually iterate until a condition is met

In [34]:
i = 0
while i < 3:
    print(i)
    i+=1
print('after loop ends: i =',i)

0
1
2
after loop ends: i = 3


## 3. Functions

### 3.1 Built-in functions

In [1]:
abs(-3)

3

In [36]:
all([True, True, False])

False

In [37]:
dir()

['In',
 'Out',
 '_',
 '_10',
 '_11',
 '_12',
 '_13',
 '_14',
 '_15',
 '_16',
 '_17',
 '_18',
 '_19',
 '_22',
 '_23',
 '_24',
 '_25',
 '_26',
 '_27',
 '_28',
 '_29',
 '_3',
 '_35',
 '_36',
 '_5',
 '_6',
 '_8',
 '_9',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i26',
 '_i27',
 '_i28',
 '_i29',
 '_i3',
 '_i30',
 '_i31',
 '_i32',
 '_i33',
 '_i34',
 '_i35',
 '_i36',
 '_i37',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'a',
 'acos',
 'acosh',
 'antigravity',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'comb',
 'copysign',
 'cos',
 'cosh',
 'd',
 'degrees',
 'dist',
 'e',
 'erf',
 'erfc',
 'exit',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 '

In [38]:
eval('1+3')

4

In [39]:
isinstance('a',str)

True

In [40]:
len([1,2,3])

3

In [41]:
max([3,4,5])

5

In [42]:
min([-3,3,9])

-3

In [43]:
pow(3,3)

27.0

In [44]:
list(reversed([1,2,3]))

[3, 2, 1]

In [45]:
round(3.8)

4

In [46]:
round(3.3)

3

In [47]:
list(sorted([4,2,6]))

[2, 4, 6]

In [48]:
sum([1,3,5])

9

In [49]:
type(3.3)

float

### 3.2 Custom functions

You can write your own functions to minimize duplicate or long sections of code

These can be used to just 'do' something, such as print

In [50]:
def foo(arg1, arg2=2):
    if arg2//2 == 0:
        print(arg1)

Or you can use them to calculate and return a value

In [51]:
def bar(arg1):
    if arg1 > 10:
        return arg1 - 10
    
    return arg1 

In [52]:
bar(12)

2

In [53]:
bar(2)

2

# 4. Numpy

Numpy is [numerical python], a library built for large arrays and matrices which is much faster than regular python. It is the first of the three big libraries used for data science!

In [1]:
import numpy as np

## 4.1 Arrays

Numpy arrays can be created from a python list

In [2]:
a = [1,2,3,4,5,6]
b = np.array(a)
b

array([1, 2, 3, 4, 5, 6])

Right now, it looks an awful like a python list, but there are some key points you should know.

Numpy arrays are:
- homogeneous (all elements in an array have the same type)
- multidimensional

In [None]:
# Homogeneous: all numpy arrays have an associated data type
# numbers are usually ints or floats
b.dtype

In [None]:
# Multidimensional: numpy arrays can have multiple dimensions, like a nested list
# We can reshape b into a 3x2 matrix
# Note: this doesn't change b. That's why we assign it to a new variable: m
m = b.reshape(3, 2)
m

In [None]:
# Each dimension is called an axis
# The size across each axis is called the shape
# These are two very important concepts!
m.shape

## 4.2 Math

Numpy gives us a lot of math functions to work with. You can find them all in the <a href=https://numpy.org/doc/stable/reference/index.html>documentation</a>.

In [3]:
np.sum(b)

21

In [None]:
np.mean(b)

In [None]:
# for convenience, you can also call
b.mean()

You can also apply these functions to only one axis

In [None]:
# only sum across rows (read: apply the sum to axis 1)
np.sum(m, axis=1)

# 5. Pandas

Pandas is another python library which we will be using _a lot!_ It lets us look at data in tabular format and is well integrated with other libraries for plotting, machine learning, etc. (we'll see some of these next week).

In [4]:
import pandas as pd

## 5.1 Dataframes & Series

Pandas puts data into dataframes, which are made up of series. A dataframe is like a table:

In [7]:
df = pd.read_csv("../data/cereal.csv")
type(df)

# here, we're reading in data from a 'csv', or comma-separated value, file

pandas.core.frame.DataFrame

In [8]:
df

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
1,100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.00,33.983679
2,All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.50,93.704912
4,Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,Triples,G,C,110,2,1,250,0.0,21.0,3,60,25,3,1.0,0.75,39.106174
73,Trix,G,C,110,1,1,140,0.0,13.0,12,25,25,2,1.0,1.00,27.753301
74,Wheat Chex,R,C,100,3,1,230,3.0,17.0,3,115,25,1,1.0,0.67,49.787445
75,Wheaties,G,C,100,3,1,200,3.0,17.0,3,110,25,1,1.0,1.00,51.592193


We can use head(), tail(), or sample() to take a look at the data

In [9]:
# head returns the first 5 rows in the dataframe, tail returns the last 5
df.head()

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
1,100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
2,All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
4,Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843


In [10]:
df.sample()

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
39,Just Right Fruit & Nut,K,C,140,3,1,170,2.0,20.0,9,95,100,3,1.3,0.75,36.471512


Each column is a pandas Series (pd.Series)

In [11]:
df["name"]

0                     100% Bran
1             100% Natural Bran
2                      All-Bran
3     All-Bran with Extra Fiber
4                Almond Delight
                ...            
72                      Triples
73                         Trix
74                   Wheat Chex
75                     Wheaties
76          Wheaties Honey Gold
Name: name, Length: 77, dtype: object

In [12]:
type(df["name"])

pandas.core.series.Series

Series are similar to numpy arrays

In [13]:
df["carbo"].mean()

14.597402597402597

In [None]:
# we can turn pd.Series into a numpy array
df["carbo"].to_numpy()

The key difference is that Series are indexed

In [None]:
# See the 0, 1, ... 76 on the left? That is the index of each item.
# Right now they are just positions, but theoretically they can be any identifier for the row

df["carbo"].index

## 5.2 Pandas Indexing

The index in a pandas series/dataframe can by any list of values (row number, ID, time, etc.)

In [18]:
# a range index is just a numeric index
df.index

RangeIndex(start=0, stop=77, step=1)

In [19]:
# see how the leftmost row is now replaced with the cereal names
df_ = df.set_index('name')
df_.head()

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843


In [20]:
df_.index

Index(['100% Bran', '100% Natural Bran', 'All-Bran',
       'All-Bran with Extra Fiber', 'Almond Delight',
       'Apple Cinnamon Cheerios', 'Apple Jacks', 'Basic 4', 'Bran Chex',
       'Bran Flakes', 'Cap'n'Crunch', 'Cheerios', 'Cinnamon Toast Crunch',
       'Clusters', 'Cocoa Puffs', 'Corn Chex', 'Corn Flakes', 'Corn Pops',
       'Count Chocula', 'Cracklin' Oat Bran', 'Cream of Wheat (Quick)',
       'Crispix', 'Crispy Wheat & Raisins', 'Double Chex', 'Froot Loops',
       'Frosted Flakes', 'Frosted Mini-Wheats',
       'Fruit & Fibre Dates; Walnuts; and Oats', 'Fruitful Bran',
       'Fruity Pebbles', 'Golden Crisp', 'Golden Grahams', 'Grape Nuts Flakes',
       'Grape-Nuts', 'Great Grains Pecan', 'Honey Graham Ohs',
       'Honey Nut Cheerios', 'Honey-comb', 'Just Right Crunchy  Nuggets',
       'Just Right Fruit & Nut', 'Kix', 'Life', 'Lucky Charms', 'Maypo',
       'Muesli Raisins; Dates; & Almonds', 'Muesli Raisins; Peaches; & Pecans',
       'Mueslix Crispy Blend', 'Multi-Gr

Indexing in pandas is a bit different than in built-in Python

`iloc` is used to index by row number in a dataframe

In [21]:
# this returns the first row of the dataframe
df_.iloc[0]

mfr                 N
type                C
calories           70
protein             4
fat                 1
sodium            130
fiber            10.0
carbo             5.0
sugars              6
potass            280
vitamins           25
shelf               3
weight            1.0
cups             0.33
rating      68.402973
Name: 100% Bran, dtype: object

`loc` is used to index by the series/dataframe index

In [22]:
df_.loc['All-Bran']

mfr                 K
type                C
calories           70
protein             4
fat                 1
sodium            260
fiber             9.0
carbo             7.0
sugars              5
potass            320
vitamins           25
shelf               3
weight            1.0
cups             0.33
rating      59.425505
Name: All-Bran, dtype: object

In [None]:
# multiple indices work
df.iloc[[1, 2, 3]]

We can also use boolean indexing to condionally select data

In [None]:
df[[True] + [False] * 76]

# [True] + [False] * 76 gives us a list that looks like [True, False, ..., False] with 1 True and 76 Falses
# This matches the number of rows in our data (77)
# pandas returns all the rows with a corresponding True (in this case, only the first one)

This is powerful because we can also make comparisons with Series and values

In [None]:
df["protein"] > 3

Combining these two things, we have a very expressive way of filtering

In [None]:
# This gives us all the rows in which the protein is greater than 3.
df[df["protein"] > 3]

##  5.3 Manipulating Data

Often when we're preprocessing data, we want to make changes to a specific column. We can do this by applying functions.

In [None]:
# Suppose we want to make the cereals more appetizing.
# Let's add "Delicious " to the beginning of every name.

# The pattern is we define a function for a single entry
def make_delicious(name):
    return "Delicious " + name

# and then call apply on the series to apply the function to each element in the series
df["name"].apply(make_delicious)

In [None]:
# this returns the changes, but doesn't apply them in place.
# that means on our original dataframe, the cereals are still bland
df.head()

In [None]:
# we can fix this by assigning the new names to the column.
df["name"] = df["name"].apply(make_delicious)
df.head()

## 5.4 Groups and Aggregates

When we have lots and lots of data, it's more useful to look at aggregate statistics like the mean or median. But sometimes we lose too much detail aggregating across the whole dataset.

The solution is to aggregate across groups. For example, maybe we're less interested in the mean calorie count of all cereals and more interested in the mean for each manufacturer.

In [14]:
# First, we can see how many (and which) unique manufacturers there are
# Note: this gives us a numpy array
df["mfr"].unique()

array(['N', 'Q', 'K', 'R', 'G', 'P', 'A'], dtype=object)

In [15]:
# Now let's group by the manufacturers
# This gives us a groupby object across the dataframe
mfrs = df.groupby("mfr")
mfrs

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x121070b20>

In [None]:
# what happens if we try to access the calories column?
mfrs["calories"]

In [None]:
# now let's try to get the mean
mfrs["calories"].mean()

In [None]:
# we can also aggregate across multiple columns, and even use different aggregations
# let's get the average calorie count but the maximum protein
mfrs[["calories", "protein"]].agg({"calories": "mean", "protein": "max"})

# 6. Plotting

<img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png" width=500 />

Visualization is an important part of exploring your data. Often, we can see trends that might get lost in rows and rows of numbers.

We'll be visualizing the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/).

<img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png" width=300 />

In [None]:
penguins = pd.read_csv("../data/penguins_lter.csv")

In [None]:
penguins.head()

## 6.1 matplotlib

`matplotlib` is _the_ plotting library in Python (but also its the black sheep). It's based on MATLAB... so if you're into that, good! If you're not, I'm so sorry.

In [None]:
import matplotlib.pyplot as plt

<img src="https://matplotlib.org/_images/anatomy.png" width=400/>

In [None]:
penguins.head()

### 6.1.1 Single Variable

The simplest visualizations we can do are for single variables. How are they distributed across our dataset? For continuous variables, we can use a histogram.

In [None]:
# what is the distribution of flipper lengths?
penguins["Flipper Length (mm)"].plot.hist(bins=20)

For discrete variables, we can use a bar chart.

In [None]:
penguins["Island"].value_counts().plot.bar()

### 6.1.2 Two variables

Sometimes we're also interested in how two variables relate to each other. For two continuous variables, we can use a scatter plot.

In [None]:
penguins.plot.scatter(x="Body Mass (g)", y="Flipper Length (mm)")

For a discrete and a continuous variable, you can use small multiples.

In [None]:
penguins["Body Mass (g)"].hist(by=penguins["Species"], figsize=(10, 10))

An alternative to small multiples is colorcoding

In [None]:
penguins.groupby("Species")["Body Mass (g)"].hist()

### 6.1.3 Three or more variables

How do we visualize more than two variables in two dimensions? There's a lot of options! For example, if we wanted to compare body mass and flipper length, but across species, we can use small multiples or color-coding also!

In [None]:
penguins.Species.unique()

In [None]:
colormap = {
    "Adelie Penguin (Pygoscelis adeliae)": "#ff8100",
    "Gentoo penguin (Pygoscelis papua)": "#087175",
    "Chinstrap penguin (Pygoscelis antarctica)": "#c15bcb"
}
penguins.plot.scatter(x="Body Mass (g)", y="Flipper Length (mm)", c=penguins.Species.apply(colormap.get), figsize=(10, 10))

## 7. Seaborn

As your visualizations become more complex and less exploratory, you might find `matplotlib` annoying or restricting. A good alternative is `seaborn` (the golden child), which is a plotting library that provides an abstraction over `matplotlib`.

In [None]:
import seaborn as sns
sns.set()  # sets settings

Let's do the color-coded scatter plot again!

In [None]:
sns.scatterplot(data=penguins, x="Body Mass (g)", y="Flipper Length (mm)", hue="Species")

Seaborn also makes it easy for us to add even more dimensions to our visualization.

In [None]:
plt.figure(figsize=(10, 10))
sns.scatterplot(data=penguins, x="Body Mass (g)", y="Flipper Length (mm)", hue="Species", size="Culmen Depth (mm)")

Using some of Seaborn's more advanced visualizations, we might even discover some actionable patterns (you can see just some of the things `seaborn` can do [here](https://seaborn.pydata.org/examples/index.html)).

Let's go back to our initial 1-dimensional continuous plots for a second.

In [None]:
sns.displot(data=penguins, x="Culmen Length (mm)", bins=30)

This is cool and all, but what if we want a smoother representation of the distribution? Seaborn provides "kernel density estimates", which is a fancy way of saying a smoother version of the histogram.

In [None]:
sns.displot(data=penguins, x="Culmen Length (mm)", kind="kde")

We can even plot the KDEs of two different continuous variables against each other!

In [None]:
sns.jointplot(data=penguins, x="Culmen Length (mm)", y="Culmen Depth (mm)", kind="kde")

Now, let's split by species to see if we can spot any differences.

In [None]:
sns.jointplot(data=penguins, x="Culmen Length (mm)", y="Culmen Depth (mm)", hue="Species", kind="kde")