<h2>Topic 1: Introduction to Pandas</h2>

---

In [11]:
%%capture
%pip install pandas
%pip install numpy

import pandas
import numpy

---
<h3>Problem 1: How do we add two lists together like this:</h3>

<img src="./images/list.jpg" width="400" height="300">

---

In [12]:
L1 = [5, 3, 2, 7]
L2 = [2, 3, 4, 1]
L3 = L1 + L2
print(L3)

[5, 3, 2, 7, 2, 3, 4, 1]


---
<h4>Normal Python Lists allow concatenation (literally adding two lists together). But if we want a way to efficiently add each element together without writing a for loop we use what you call a Numpy Array:</h4>

---

In [17]:
L1 = numpy.array(L1)
L2 = numpy.array(L2)
L3 = L1 + L2
print(L3)

# If we still want to concatenate we can simply do:

L3 = numpy.concatenate((L1, L2))
print(L3)

[7 6 6 8]
[5 3 2 7 2 3 4 1]


---
<h4>The power really comes when it comes to higher dimensional data (e.g matrices) and Numpy fully supports it</h4>

---

In [None]:
M1 = [[1, 2], [3, 4]]
M1 = numpy.array(M1)

print(M1 + 5, "\n") # Adds 5 to ALL entries

print(M1 * 5, "\n") # Multiplies 5 to ALL entries

print(M1 + M1 ** 2, "\n") # Adds the second power of itself to itself (Note that this is NOT matrix multiplication)

print(M1 @ M1) # Matrix Multiplication (we won't get into this)

[[6 7]
 [8 9]] 

[[ 5 10]
 [15 20]] 

[[ 2  6]
 [12 20]] 

[[ 7 10]
 [15 22]]


---

<h4>Problem 2: How do we represent missing data?</h4>

---

In [52]:
# We can simply use numpy.nan to represent missing data, this is also called Null or NaN

MyList = [1, 5, 3, 5, numpy.nan]
L1 = numpy.array(MyList)

# Now operations will simply "ignore" null positions

L1 + 7

array([ 8., 12., 10., 12., nan])

---

<h4>Finally, an important feature of numpy arrays is called boolean indexxing (or masking):</h4>

<img src="./images/boolean.jpg" height=300, width=500/>

---

In [62]:
L1 = numpy.array([1, 2, 5, 6, 3, 4])

# Step 1
L2 = L1 > 3 # This creates a boolean mask
print(L2)

# Step 2
L3 = L1[L2] # This applies the boolean mask
print(L3)

# But it's easier to just do:
L3 = L1[L1 > 3]
print(L3)

[False False  True  True False  True]
[5 6 4]
[5 6 4]


---

<h4> Now let's get into Pandas data structures, which are built on numpy arrays! Pandas has mainly two data structures:</h4>

#### 1. Series
#### 2. DataFrame

---

<h4> Pandas Series: </h4>

<img src="./images/series.jpg" height=300, width=400>

---

In [34]:
# Pandas Series

S1 = pandas.Series([9, 9, 3], index=[1, 2, 3])
S1

1    9
2    9
3    3
dtype: int64

---

<h4>Note that Pandas Series are ordered by index alphanumerically!</h4>

---

In [65]:
S1 = pandas.Series([20, 23, 34], index=["A", 2, 3])
S2 = pandas.Series([66, 65, 55, 43], index=[1, "A", 3, "Z"])
S1 + S2

1     NaN
2     NaN
3    89.0
A    85.0
Z     NaN
dtype: float64

---
<h4>Pandas Series work exactly like numpy arrays! We will now move on to discuss about DataFrames, which is the central data structure to Pandas and real life data science work!</h4>

---

<h2>Topic 2: Data Querying</h2>

---

<h2>Topic 3: Data Manipulation</h2>

---

<h2>Topic 4: Working with Multiple Tables</h2>

---