# Python Notes: Pandas
<hr>

### Creating and Accessing Dataframes

Libraries/Dependencies are pre-written code, including functions, classes, and objects  
Pandas is for data analysis  
>2 primary data structures to manipulate data: DataFrames and Series  
  
DataFrame is 2 dimensional, consisting of rows, columns and values  
Series is 1 dimensional, consisting of the values and its index

**import _myLibrary_ as _abbreviation_**  
imports and renames the library when referencing  
> pandas is commonly abbreviated to "pd" and numpy as "??"  

In [80]:
import pandas as pd

**_myPath_ = _myCSVPath_**  
**_myDataFrame_ = pd.read_csv(_myPath_)**  
> Converts CSV to Data Frame  
  
**_myPath_ = _myExcelPath_**  
**_myDataFrame_ = pd.read_excel(_myPath_)**  
> Converts Excel to Data Frame
  
**_myDateFrame_.head()**  
> Returns first five rows of the CSV

In [81]:
csvPath = "C:\\Users\\Binaryxx Sune\\Desktop\\Personal Files\\Stream\\Viewer Engagement - Analytics by day from Nov_23_2021 to Nov_22_2022 - Copy.csv"
myDF = pd.read_csv(csvPath)

myDF.head()

Unnamed: 0,Date,Minutes Watched,Unique Viewers,Average Viewers,Follows,Chatters,Minutes Streamed,Live Views,Max Viewers,Hosts and Raids Viewers (%),...,Clip Views,Prime Subs,Total Paid Subs,Tier 1 subs,Total Gifted Subs,Gifted Tier 1 subs,Total Multi-Month Gifted subs,Multi-Month Gifted Tier 1 subs,Multi-Month Gifted Tier 2 subs,Multi-Month Gifted Tier 3 subs
0,Tue Nov 23 2021,86,2,0.92,0,1,93,2,1,0.0,...,0,0,0,0,0,0,0,0,0,0
1,Wed Nov 24 2021,24,1,1.33,0,0,18,3,3,0.0,...,0,0,0,0,0,0,0,0,0,0
2,Thu Nov 25 2021,0,0,0.0,0,0,0,0,0,0.0,...,0,0,0,0,0,0,0,0,0,0
3,Fri Nov 26 2021,0,0,0.0,0,0,0,0,0,0.0,...,0,0,0,0,0,0,0,0,0,0
4,Sat Nov 27 2021,0,0,0.0,0,0,0,0,0,0.0,...,0,0,0,0,0,0,0,0,0,0


Dataframes are comprised of rows and columns  
> A df can be created by creating a dictionary  
>> The Keys are the Columns; the Rows are Lists of Values  
> A df can also be created by using 3 lists; the first one as nested lists with the values per row represented as the nested list, the second as the row labels, the third as column labels  
>> this automatically sets the first column as the index column instead of having a numbered sequence index  
  
**_MyDF_ = pandas.DataFrame(_myDict_)**  
To change (cast) the dictionary into a data frame  

In [99]:
dataList = [[11,12,13], [21,22,23], [31,32,33]]
rowList = ["a","b","c"]
columnList = ["x","y","z"]

myDF2 = pd.DataFrame(dataList,rowList,columnList)

myDF2

Unnamed: 0,x,y,z
a,11,12,13
b,21,22,23
c,31,32,33


In [82]:
myDict = {"Column1": ["Row11","Row12","Row13","Row13"],"Column2":["Row21","Row22","Row23","Row23"], "Column3":["Row31","Row32","Row33","Row33"]}

myDF = pd.DataFrame(myDict)
myDF.head()

Unnamed: 0,Column1,Column2,Column3
0,Row11,Row21,Row31
1,Row12,Row22,Row32
2,Row13,Row23,Row33
3,Row13,Row23,Row33


**_newDF_ = _myDF_[["Column1", ..., "ColumnN"]]**  
To create a df consisting of N columns  
> Using only one set of brackets returns a series, which will only take one column

In [83]:
newDF = myDF[["Column1","Column3"]]
newDF.head()

Unnamed: 0,Column1,Column3
0,Row11,Row31
1,Row12,Row32
2,Row13,Row33
3,Row13,Row33


<hr>

### Working with and Saving Data

**_myDF_["_Column1_"].unique()**  
returns the unique elements in the column  
  
**_df_["_Column1_"]_logic operator and condition_**  
returns a series of boolean values, indicating whether each row is True or False to the condition set  
  
**_df1_ = _df_[ [_df_["_Column1_"]_logic operator and condition_]**  
creates a new dataframe consisting of all rows returning True in 
  


In [84]:
newDF["Column1"].unique()

array(['Row11', 'Row12', 'Row13'], dtype=object)

In [85]:
newDF["Column1"] == "Row12"

0    False
1     True
2    False
3    False
Name: Column1, dtype: bool

In [86]:
df1 = newDF [ newDF["Column1"] == "Row12"  ]
df1.head()

# returns the row that matches, including other columns

Unnamed: 0,Column1,Column3
1,Row12,Row32


**_myDF_.loc[_row label/index_,_column label_]**  
returns the value given the row label and column label  
>> when using slicing, even when using integer index for dataframes where the index is not set, the second value of the range is inclusive  
>>> when slicing, both the rows and columns have to be a sliced range  
  
**_myDF_.iloc[_row index_,_column index_]**  
returns the value given the row index and column index  
> for the row name, an index has to be set to refer to it by name  
>> when using slicing, the second value of the range is exclusive    
>>> when slicing, both the rows and columns have to be a sliced range

**_myDF_.set_index(_column to make an index_)**  
returns a dataframe with the selected column as the index, meaning that the values of that column can be used to refer to the rows (act as the index that was previously the sequenced numbers)  
> needs to reassign to another variable to save the indexed dataframe  
>> when creating a new column to be set as an index, the following can be used:  
**_myDF_.index = _myIndexList_**  
  
**_myDF_["_new column name_"] = myList**  
creates a new column with rows based on the list given

In [87]:
myDF.head()

Unnamed: 0,Column1,Column2,Column3
0,Row11,Row21,Row31
1,Row12,Row22,Row32
2,Row13,Row23,Row33
3,Row13,Row23,Row33


In [88]:
myDF.iloc[1,1]

'Row22'

In [89]:
indexedDF = myDF

indexList = ["apple","banana","coconut","dragonfruit"]
indexList2 = ["a","b","c","d"]
indexedDF["Index"] = indexList

indexedDF = indexedDF.set_index("Index")


#alternatively:
#indexedDF.index = indexList2


indexedDF.head()

Unnamed: 0_level_0,Column1,Column2,Column3
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
apple,Row11,Row21,Row31
banana,Row12,Row22,Row32
coconut,Row13,Row23,Row33
dragonfruit,Row13,Row23,Row33


In [90]:
indexedDF.loc["banana","Column2"]

'Row22'

In [91]:
#indexedDF.loc["apple":"coconut","Column1":"Column2"]

#when the index is not set
myDF.loc[0:2,"Column1":"Column2"]

Unnamed: 0,Column1,Column2
0,Row11,Row21
1,Row12,Row22
2,Row13,Row23


  **_df1_.to_csv("_newFileName_")**  
Saves dataframe as a csv  

In [92]:
df1.to_csv("TestCSV")