In [314]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame


<b><h1>Series   5.1 SERIES</h1></b>
A series is a dict with one key and a number of values. When you use pd to display a series, it will list the values by index. 

In [315]:
obj = pd.Series([0,2,5,9,10])
obj

0     0
1     2
2     5
3     9
4    10
dtype: int64

You can <h2>name the index</h2> in order to label your rows. Make sure each part is in a list [ ].

In [316]:
obj2 = pd.Series([4,3,2,9], index=["cats","birds","dogs","spiders"])
obj2

cats       4
birds      3
dogs       2
spiders    9
dtype: int64

<h2>Call cells</h2>
You can then call cells via their index lables

In [317]:
print(f"There are {obj2["cats"]} cats.")
print(f"We have {obj2["dogs"]} dogs, and {obj2["spiders"]} spiders.")

There are 4 cats.
We have 2 dogs, and 9 spiders.


You can use math and conditionals as well.

In [318]:
obj2[obj2>3]

cats       4
spiders    9
dtype: int64

In [319]:
"fish" in obj2

False

In [320]:
obj2["cats"]==9

np.False_

You can pass in dicts.

In [321]:
yarns={"red":2,"blue":5,"orange":1}
obj3 = pd.Series(yarns)
obj3

red       2
blue      5
orange    1
dtype: int64

<h2>.to_</h2>
This allows you to rewrite series and DFs into CSVs, SQLS, dicts, or JSON.

In [322]:
obj3.to_dict()

{'red': 2, 'blue': 5, 'orange': 1}

You can add tables together.

In [323]:
obj2+obj3

birds     NaN
blue      NaN
cats      NaN
dogs      NaN
orange    NaN
red       NaN
spiders   NaN
dtype: float64

<h2>.name=</h2>
You can give a label to your index with .name

In [324]:
obj3.index.name="Colors"
obj3.name="Yarn Inventory"
obj3

Colors
red       2
blue      5
orange    1
Name: Yarn Inventory, dtype: int64

<b><h1>DataFrame    5.1 DATAFRAME</h1></b>
DataFrames are two series placed together. They can be made from dicts of equal lengths, csvs, jsons, and sqls.

In [325]:
data = {"Magic":["Glitter Bomb","Icey Beam","Petal Flurry","Sugar Crash"],"Mane Color":["Pink","White","Red","Green"],"Age":[209,3094,200,35]}
frame = pd.DataFrame(data)
frame

Unnamed: 0,Magic,Mane Color,Age
0,Glitter Bomb,Pink,209
1,Icey Beam,White,3094
2,Petal Flurry,Red,200
3,Sugar Crash,Green,35


<h2>.head and .tail</h2> 
They will give the first or last 5 lines of a df. You can add a number to the () to give a specific amount of lines. You can also specify the order of cols.

In [326]:
pd.DataFrame(data, columns=["Age","Mane Color","Magic"])

Unnamed: 0,Age,Mane Color,Magic
0,209,Pink,Glitter Bomb
1,3094,White,Icey Beam
2,200,Red,Petal Flurry
3,35,Green,Sugar Crash


In [327]:
frame.index=["Sparkles","Moonbeam","Rosey","Sprinkles"]
frame
# Name the indexes

Unnamed: 0,Magic,Mane Color,Age
Sparkles,Glitter Bomb,Pink,209
Moonbeam,Icey Beam,White,3094
Rosey,Petal Flurry,Red,200
Sprinkles,Sugar Crash,Green,35


<h2>loc[]/iloc[]</h2>
You can call a col by ["col name"] and a row with loc and iloc for the row index. You can print ranges of rows by adding indexs to iloc. Caps matter!

In [328]:
frame

Unnamed: 0,Magic,Mane Color,Age
Sparkles,Glitter Bomb,Pink,209
Moonbeam,Icey Beam,White,3094
Rosey,Petal Flurry,Red,200
Sprinkles,Sugar Crash,Green,35


In [329]:
frame["Age"]

Sparkles      209
Moonbeam     3094
Rosey         200
Sprinkles      35
Name: Age, dtype: int64

In [330]:
frame.loc["Moonbeam"]

Magic         Icey Beam
Mane Color        White
Age                3094
Name: Moonbeam, dtype: object

In [331]:
frame.iloc[2:]

Unnamed: 0,Magic,Mane Color,Age
Rosey,Petal Flurry,Red,200
Sprinkles,Sugar Crash,Green,35


### (5.2) loc and iloc can reutrn a new df by listing more than one thing in the index.

In [332]:
frame

Unnamed: 0,Magic,Mane Color,Age
Sparkles,Glitter Bomb,Pink,209
Moonbeam,Icey Beam,White,3094
Rosey,Petal Flurry,Red,200
Sprinkles,Sugar Crash,Green,35


In [333]:
#row 0 Sparkles, columns 2,0,1 listed
weird = frame.iloc[0,[2,0,1]]
weird

Age                    209
Magic         Glitter Bomb
Mane Color            Pink
Name: Sparkles, dtype: object

You can use label names with .loc

In [334]:
frame

Unnamed: 0,Magic,Mane Color,Age
Sparkles,Glitter Bomb,Pink,209
Moonbeam,Icey Beam,White,3094
Rosey,Petal Flurry,Red,200
Sprinkles,Sugar Crash,Green,35


In [335]:
wut=frame.loc["Moonbeam",["Age","Magic"]]
wut

Age           3094
Magic    Icey Beam
Name: Moonbeam, dtype: object

In [336]:
#.column name and then a conditional narrows the table
odd=frame.loc[:][frame.Age >100]
odd

Unnamed: 0,Magic,Mane Color,Age
Sparkles,Glitter Bomb,Pink,209
Moonbeam,Icey Beam,White,3094
Rosey,Petal Flurry,Red,200


You can also rewrite data using loc. 

In [360]:
frame.loc[frame.Age==209, "Age"]=210
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,210,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey Pie,Petal Flurry,Red,200,Gardens
Sprinkles,Sugar Crash,Green,35,Clouds


Happy birthday Sparkles!

<h2>Add a column</h2>
Add a column by naming the df, ["col name"]=[value list]. 

In [338]:
frame["Biome"]=["Forest","Artic","Gardens","Clouds"]
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,209,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey,Petal Flurry,Red,200,Gardens
Sprinkles,Sugar Crash,Green,35,Clouds


<h2>Incomplete Series addition with idexes</h2>
You can add a series with mising vals by including the indexs.

In [368]:
scent = pd.Series(["Mint","Rose"], index=[1,2])
frame["Scent"]=scent
frame

Unnamed: 0,Magic,Mane Color,Age,Biome,Scent,scent
Sparkles,Glitter Bomb,Pink,210,Forest,,
Moonbeam,Icey Beam,White,3094,Artic,,
Rosey Pie,Petal Flurry,Red,200,Gardens,,
Sprinkles,Sugar Crash,Green,35,Clouds,,


In [340]:
frame.index=["Sparkles","Moonbeam","Rosey Pie","Sprinkles"]
frame
#Tired of not having names on the index

Unnamed: 0,Magic,Mane Color,Age,Biome,Scent
Sparkles,Glitter Bomb,Pink,209,Forest,
Moonbeam,Icey Beam,White,3094,Artic,
Rosey Pie,Petal Flurry,Red,200,Gardens,
Sprinkles,Sugar Crash,Green,35,Clouds,


<h2>del to delete</h2>
Delete cols with del

In [341]:
del frame["Scent"]
frame

Unnamed: 0,Magic,Mane Color,Age,Biome
Sparkles,Glitter Bomb,Pink,209,Forest
Moonbeam,Icey Beam,White,3094,Artic
Rosey Pie,Petal Flurry,Red,200,Gardens
Sprinkles,Sugar Crash,Green,35,Clouds


<h2>DF made by dict of dicts</h2>
You can pass in a dict of dict and Pandas will name the cols with the outer keys and the indexs with the inner keys.

In [342]:
abel_personel = {"Job":{"Sam":"Radio Operator","Janine":"Director","Dr Myers":"Doctor"},
                 "Fav Food":{"Sam":"Curly Wurlies","Janine":"Potatoes","Dr Myers":"Apples"}}

abel_frame = pd.DataFrame(abel_personel)
abel_frame

Unnamed: 0,Job,Fav Food
Sam,Radio Operator,Curly Wurlies
Janine,Director,Potatoes
Dr Myers,Doctor,Apples


Reorganize the rows:

In [343]:
abp2=abel_frame.reindex(["Janine","Dr Myers","Sam"])
abp2

Unnamed: 0,Job,Fav Food
Janine,Director,Potatoes
Dr Myers,Doctor,Apples
Sam,Radio Operator,Curly Wurlies


<h1>5.2  Essential Functionality INTEROPLATION</h1>

Interoplation: You can fill in info that is missing from original sources:

In [344]:
fgames=pd.Series(["Portal","BioShock","Cult of the Lamb"], index=[0,3,5])
fgames

0              Portal
3            BioShock
5    Cult of the Lamb
dtype: object

<h2>np.arange and ffill Method</h2>
To stretch a table, use arrange to add cells with null cells. ffill will copy into the nulls the last listed value.

In [345]:
fgames.reindex(np.arange(6),method="ffill")
##This males a new one, not replace the old one.

0              Portal
1              Portal
2              Portal
3            BioShock
4            BioShock
5    Cult of the Lamb
dtype: object

<h2>.reshape(())</h2>
.reshape( (tuple) ) specifies how many columns and rows need to be there. 

In [346]:
exex=pd.DataFrame(np.arange(9).reshape((3,3)),
index=["Sam","Janine","Maxine"],columns=["Games Won","Games Lost","Games Tied"])
exex

Unnamed: 0,Games Won,Games Lost,Games Tied
Sam,0,1,2
Janine,3,4,5
Maxine,6,7,8


<h2>.reindex(index=["rows"])</h2>
index lables the rows, columns labels the cols. The cells are filled with numbers by the arrange method.

In [347]:
atgn=exex.reindex(index=["Sam","Jodie","Janine","maxine","Five"])
atgn

Unnamed: 0,Games Won,Games Lost,Games Tied
Sam,0.0,1.0,2.0
Jodie,,,
Janine,3.0,4.0,5.0
maxine,,,
Five,,,


If you add rows with reindex, they will be filled with nas. This DOES NOT alter the original table and needsto be dumped in a var.

In [348]:
gncols=["Games Won","Games Lost","Games Tied","Games Missed"]
atgn2=atgn.reindex(columns=gncols)
atgn2

Unnamed: 0,Games Won,Games Lost,Games Tied,Games Missed
Sam,0.0,1.0,2.0,
Jodie,,,,
Janine,3.0,4.0,5.0,
maxine,,,,
Five,,,,


## drop("")
You can drop an index and return a new table.

In [349]:
atgn3=atgn2.drop("Five")
atgn3

Unnamed: 0,Games Won,Games Lost,Games Tied,Games Missed
Sam,0.0,1.0,2.0,
Jodie,,,,
Janine,3.0,4.0,5.0,
maxine,,,,


## Data Alignment
When you mae a DF from two series of diff lengths that share some indexes, you get a bunch of nulls.

In [350]:
a1=pd.Series([4.0,2.0,3.0,10.0], index=["Portal","Far Cry 5","Witness","Cult of the Lamb"])
a2=pd.Series([8.0,6.0,4.0], index=["Portal","Uncharted","Cult of the Lamb"])
a1

Portal               4.0
Far Cry 5            2.0
Witness              3.0
Cult of the Lamb    10.0
dtype: float64

In [351]:
a2

Portal              8.0
Uncharted           6.0
Cult of the Lamb    4.0
dtype: float64

In [352]:
a1+a2
#This will add the numberic values. If nothing is added, the result is turned null.

Cult of the Lamb    14.0
Far Cry 5            NaN
Portal              12.0
Uncharted            NaN
Witness              NaN
dtype: float64

### .concat([s1,s2], axis=1)
We need to realign the data. 

In [353]:
grating=pd.concat([a1,a2], axis=1)
grating

Unnamed: 0,0,1
Portal,4.0,8.0
Far Cry 5,2.0,
Witness,3.0,
Cult of the Lamb,10.0,4.0
Uncharted,,6.0


### Fill Value
fill_value=0 lets you add two mis-matched semi-sharing series without any nans. 

In [354]:
bettergrating=a1.add(a2, fill_value=0)
bettergrating

Cult of the Lamb    14.0
Far Cry 5            2.0
Portal              12.0
Uncharted            6.0
Witness              3.0
dtype: float64

## Function Application and Mapping

### .random.standard_normal((tuple))
Random.stanrd_normal fills cells with random floats. The tuple is the rows/cols like in the shape.

In [355]:
randorate=pd.DataFrame(np.random.standard_normal((4,3)),
                       columns=["Hair","Smile","Style"],
                       index=["Sam","Janine","Maxine","Five"])
randorate

Unnamed: 0,Hair,Smile,Style
Sam,-0.300477,0.61478,1.091987
Janine,-1.635806,-0.015615,-1.670969
Maxine,0.849102,0.825574,-1.003929
Five,0.315965,-1.049139,1.536692


### .abs(df name)
.abs turns all negatives into positives.

In [356]:
np.abs(randorate)

Unnamed: 0,Hair,Smile,Style
Sam,0.300477,0.61478,1.091987
Janine,1.635806,0.015615,1.670969
Maxine,0.849102,0.825574,1.003929
Five,0.315965,1.049139,1.536692


### .apply()
Allows you to apply a function to the DF on a row/col basis.

In [357]:
def f1(x):
    return x.max()-x.min()

randorate.apply(f1)

Hair     2.484908
Smile    1.874713
Style    3.207661
dtype: float64

In [358]:
randorate.apply(f1,axis="columns")
# Run the fuction across the rows instead of the cols

Sam       1.392464
Janine    1.655354
Maxine    1.853031
Five      2.585831
dtype: float64

In [359]:
def f2(x):
    return pd.Series([x.min(),x.max()], index=["Min","Max"])

randorate.apply(f2)
# Run a function that returns a series, runs across the cols.

Unnamed: 0,Hair,Smile,Style
Min,-1.635806,-1.049139,-1.670969
Max,0.849102,0.825574,1.536692


### .applymap()
This applies a function to each element in a DataFrame

### .map()
applies a function on each element of a Seires

## 5.2 Sorting and Ranking

### .sort_index()
Rearranges DF in 123 or abc order via index.

In [369]:
frame.sort_index()

Unnamed: 0,Magic,Mane Color,Age,Biome,Scent,scent
Moonbeam,Icey Beam,White,3094,Artic,,
Rosey Pie,Petal Flurry,Red,200,Gardens,,
Sparkles,Glitter Bomb,Pink,210,Forest,,
Sprinkles,Sugar Crash,Green,35,Clouds,,


In [370]:
frame.sort_index(axis="columns")

Unnamed: 0,Age,Biome,Magic,Mane Color,Scent,scent
Sparkles,210,Forest,Glitter Bomb,Pink,,
Moonbeam,3094,Artic,Icey Beam,White,,
Rosey Pie,200,Gardens,Petal Flurry,Red,,
Sprinkles,35,Clouds,Sugar Crash,Green,,


### .sort_values()
Sorts via values

In [374]:
badex=pd.Series([2,4,1,6,8,np.nan])
badex

0    2.0
1    4.0
2    1.0
3    6.0
4    8.0
5    NaN
dtype: float64

In [375]:
badex.sort_values()

2    1.0
0    2.0
1    4.0
3    6.0
4    8.0
5    NaN
dtype: float64

In [377]:
randorate.sort_values("Smile")
# Passing a col name sorts the whole thing by the col's values.

Unnamed: 0,Hair,Smile,Style
Five,0.315965,-1.049139,1.536692
Janine,-1.635806,-0.015615,-1.670969
Sam,-0.300477,0.61478,1.091987
Maxine,0.849102,0.825574,-1.003929


In [378]:
randorate.sort_values(["Smile","Hair"])

Unnamed: 0,Hair,Smile,Style
Five,0.315965,-1.049139,1.536692
Janine,-1.635806,-0.015615,-1.670969
Sam,-0.300477,0.61478,1.091987
Maxine,0.849102,0.825574,-1.003929


### .rank()
You might use the rank() function in pandas for several reasons:

Identifying the position of values:
Rank helps you determine the relative position of values within a column or row, allowing you to identify the top performers, bottom performers, or any specific rank.

Sorting data based on rank:
You can sort your DataFrame based on the ranks instead of the actual values, which can be useful in scenarios where the relative position is more important than the exact value.

Handling ties:
The rank() function provides different methods for handling ties, such as assigning the average rank, the minimum rank, or the maximum rank to tied values.

Creating new features:
Ranking can be used to create new features for machine learning models. For example, you might rank customers based on their spending and use that rank as a feature in a prediction model.

Filtering data:
You can use rank to filter data, such as selecting the top 10% of performers or the bottom 20% of performers.

In [379]:
badex.rank()

0    2.0
1    3.0
2    1.0
3    4.0
4    5.0
5    NaN
dtype: float64

## Axis Indexes with Dupe Lables

### is_unique
Lets you know if the lables are unique as a boolean.