In [161]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame


<b><h1>Series   5.1 SERIES</h1></b>
A series is a dict with one key and a number of values. When you use pd to display a series, it will list the values by index. 

In [162]:
obj = pd.Series([0,2,5,9,10])
obj

0     0
1     2
2     5
3     9
4    10
dtype: int64

You can <h2>name the index</h2> in order to label your rows. Make sure each part is in a list [ ].

In [163]:
obj2 = pd.Series([4,3,2,9], index=["cats","birds","dogs","spiders"])
obj2

cats       4
birds      3
dogs       2
spiders    9
dtype: int64

<h2>Call cells</h2>
You can then call cells via their index lables

In [164]:
print(f"There are {obj2["cats"]} cats.")
print(f"We have {obj2["dogs"]} dogs, and {obj2["spiders"]} spiders.")

There are 4 cats.
We have 2 dogs, and 9 spiders.


You can use math and conditionals as well.

In [165]:
obj2[obj2>3]

cats       4
spiders    9
dtype: int64

In [166]:
"fish" in obj2

False

In [167]:
obj2["cats"]==9

np.False_

You can pass in dicts.

In [168]:
yarns={"red":2,"blue":5,"orange":1}
obj3 = pd.Series(yarns)
obj3

red       2
blue      5
orange    1
dtype: int64

<h2>.to_</h2>
This allows you to rewrite series and DFs into CSVs, SQLS, dicts, or JSON.

In [169]:
obj3.to_dict()

{'red': 2, 'blue': 5, 'orange': 1}

You can add tables together.

In [170]:
obj2+obj3

birds     NaN
blue      NaN
cats      NaN
dogs      NaN
orange    NaN
red       NaN
spiders   NaN
dtype: float64

<h2>.name=</h2>
You can give a label to your index with .name

In [171]:
obj3.index.name="Colors"
obj3.name="Yarn Inventory"
obj3

Colors
red       2
blue      5
orange    1
Name: Yarn Inventory, dtype: int64

<b><h1>DataFrame    5.1 DATAFRAME</h1></b>
DataFrames are two series placed together. They can be made from dicts of equal lengths, csvs, jsons, and sqls.

In [172]:
data = {"Magic":["Glitter Bomb","Icey Beam","Petal Flurry","Sugar Crash"],"Mane Color":["Pink","White","Red","Green"],"Age":[209,3094,200,35]}
frame = pd.DataFrame(data)
frame

Unnamed: 0,Magic,Mane Color,Age
0,Glitter Bomb,Pink,209
1,Icey Beam,White,3094
2,Petal Flurry,Red,200
3,Sugar Crash,Green,35


<h2>.head and .tail</h2> 
They will give the first or last 5 lines of a df. You can add a number to the () to give a specific amount of lines. You can also specify the order of cols.

In [173]:
pd.DataFrame(data, columns=["Age","Mane Color","Magic"])

Unnamed: 0,Age,Mane Color,Magic
0,209,Pink,Glitter Bomb
1,3094,White,Icey Beam
2,200,Red,Petal Flurry
3,35,Green,Sugar Crash


<h2>df["col name"] and loc[]/iloc[]</h2>
You can call a col by ["col name"] and a row with loc and iloc for the row index. You can print ranges of rows by adding indexs to iloc. Caps matter!

In [174]:
frame["Age"]

0     209
1    3094
2     200
3      35
Name: Age, dtype: int64

In [175]:
frame.loc[2]

Magic         Petal Flurry
Mane Color             Red
Age                    200
Name: 2, dtype: object

In [176]:
frame.iloc[2:]

Unnamed: 0,Magic,Mane Color,Age
2,Petal Flurry,Red,200
3,Sugar Crash,Green,35


(5.2) loc and iloc can reutrn a new df by listing more than one thing in the index.

In [192]:
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,209,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey Pie,Petal Flurry,Red,200,Gardens
Sprinkles,Sugar Crash,Green,35,Clouds


In [191]:
#row 0 Sparkles, columns 2,0,1 listed
weird = frame.iloc[0,[2,0,1]]
weird

Age                    209
Magic         Glitter Bomb
Mane Color            Pink
Name: Sparkles, dtype: object

You can use label names with .loc

In [204]:
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,209,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey Pie,Petal Flurry,Red,200,Gardens
Sprinkles,Sugar Crash,Green,35,Clouds


In [203]:
wut=frame.loc["Moonbeam",["Age","Magic"]]
wut

Age           3094
Magic    Icey Beam
Name: Moonbeam, dtype: object

In [198]:
#.column name and then a conditional narrows the table
odd=frame.loc[:][frame.Age >100]
odd

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,209,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey Pie,Petal Flurry,Red,200,Gardens


You can also rewrite data using loc. 

In [207]:
frame.loc[frame.Age==209, "Age"]=211
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,211,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey Pie,Petal Flurry,Red,200,Gardens
Sprinkles,Sugar Crash,Green,35,Clouds


In [None]:
Happy birthday Sparkles!

<h2>Add a column</h2>
Add a column by naming the df, ["col name"]=[value list]. 

In [177]:
frame["Biome"]=["Forest","Artic","Gardens","Clouds"]
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
0,Glitter Bomb,Pink,209,Forest
1,Icey Beam,White,3094,Artic
2,Petal Flurry,Red,200,Gardens
3,Sugar Crash,Green,35,Clouds


<h2>Incomplete Series addition with idexes</h2>
You can add a series with mising vals by including the indexs.

In [178]:
scent = pd.Series(["Mint","Rose"], index=[1,2])
frame["Scent"]=scent
frame

Unnamed: 0,Magic,Mane Color,Age,Biome,Scent
0,Glitter Bomb,Pink,209,Forest,
1,Icey Beam,White,3094,Artic,Mint
2,Petal Flurry,Red,200,Gardens,Rose
3,Sugar Crash,Green,35,Clouds,


In [179]:
frame.index=["Sparkles","Moonbeam","Rosey Pie","Sprinkles"]
frame
#Tired of not having names on the index

Unnamed: 0,Magic,Mane Color,Age,Biome,Scent
Sparkles,Glitter Bomb,Pink,209,Forest,
Moonbeam,Icey Beam,White,3094,Artic,Mint
Rosey Pie,Petal Flurry,Red,200,Gardens,Rose
Sprinkles,Sugar Crash,Green,35,Clouds,


<h2>del to delete</h2>
Delete cols with del

In [180]:
del frame["Scent"]
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,209,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey Pie,Petal Flurry,Red,200,Gardens
Sprinkles,Sugar Crash,Green,35,Clouds


<h2>DF made by dict of dicts</h2>
You can pass in a dict of dict and Pandas will name the cols with the outer keys and the indexs with the inner keys.

In [181]:
abel_personel = {"Job":{"Sam":"Radio Operator","Janine":"Director","Dr Myers":"Doctor"},
                 "Fav Food":{"Sam":"Curly Wurlies","Janine":"Potatoes","Dr Myers":"Apples"}}

abel_frame = pd.DataFrame(abel_personel)
abel_frame

Unnamed: 0,Job,Fav Food
Sam,Radio Operator,Curly Wurlies
Janine,Director,Potatoes
Dr Myers,Doctor,Apples


Reorganize the rows:

In [182]:
abp2=abel_frame.reindex(["Janine","Dr Myers","Sam"])
abp2

Unnamed: 0,Job,Fav Food
Janine,Director,Potatoes
Dr Myers,Doctor,Apples
Sam,Radio Operator,Curly Wurlies


<h1>Essential Functionality INTEROPLATION</h1>

Interoplation: You can fill in info that is missing from original sources:

In [183]:
fgames=pd.Series(["Portal","BioShock","Cult of the Lamb"], index=[0,3,5])
fgames

0              Portal
3            BioShock
5    Cult of the Lamb
dtype: object

<h2>.arange and ffill Method</h2>
To stretch a table, use arrange to add cells with null cells. ffill will copy into the nulls the last listed value.

In [184]:
fgames.reindex(np.arange(6),method="ffill")
##This males a new one, not replace the old one.

0              Portal
1              Portal
2              Portal
3            BioShock
4            BioShock
5    Cult of the Lamb
dtype: object

<h2>.reshape(())</h2>
.reshape( (tuple) ) specifies how many columns and rows need to be there. 

In [185]:
exex=pd.DataFrame(np.arange(9).reshape((3,3)),
index=["Sam","Janine","Maxine"],columns=["Games Won","Games Lost","Games Tied"])
exex

Unnamed: 0,Games Won,Games Lost,Games Tied
Sam,0,1,2
Janine,3,4,5
Maxine,6,7,8


<h2>.reindex(index=["rows"])</h2>
index lables the rows, columns labels the cols. The cells are filled with numbers by the arrange method.

In [186]:
atgn=exex.reindex(index=["Sam","Jodie","Janine","maxine","Five"])
atgn

Unnamed: 0,Games Won,Games Lost,Games Tied
Sam,0.0,1.0,2.0
Jodie,,,
Janine,3.0,4.0,5.0
maxine,,,
Five,,,


If you add rows with reindex, they will be filled with nas. This DOES NOT alter the original table and needsto be dumped in a var.

In [187]:
gncols=["Games Won","Games Lost","Games Tied","Games Missed"]
atgn2=atgn.reindex(columns=gncols)
atgn2

Unnamed: 0,Games Won,Games Lost,Games Tied,Games Missed
Sam,0.0,1.0,2.0,
Jodie,,,,
Janine,3.0,4.0,5.0,
maxine,,,,
Five,,,,


## drop("")
You can drop an index and return a new table.

In [188]:
atgn3=atgn2.drop("Five")
atgn3

Unnamed: 0,Games Won,Games Lost,Games Tied,Games Missed
Sam,0.0,1.0,2.0,
Jodie,,,,
Janine,3.0,4.0,5.0,
maxine,,,,
