<a href="https://colab.research.google.com/github/ajayboi/hello-world/blob/master/Cheat_Sheet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd

### read_csv

Read a comma-separated values (csv) file into DataFrame.<br><br>
<b> Parameter:</b>
- filepath_or_buffer: Address of the file to be accessed (can be a URL or local address

In [None]:
file_address = "D:/Sports Analysis/Sports Analytics Course Content/Sample_CSV.csv"
df = pd.read_csv(file_address)

In [None]:
df

Unnamed: 0,Name,Team,Runs,Balls,Not Out
0,A,Ind,54,80,0
1,B,Aus,63,25,0
2,C,Ind,95,96,0
3,D,Eng,12,20,0
4,E,Ind,3,8,0
5,F,Eng,6,2,1
6,G,Eng,45,50,1
7,H,Eng,89,101,0
8,I,Aus,25,40,0
9,J,Ind,36,18,0


### DataFrame

Converts dictionary or 2-D lists to a dataframe (DataFrame is a 2-dimensional labeled data structure with columns of potentially different types)

<b> Parameters: </b>

- data: a 2-D list or dictionary which needs to be converted into dataframe
- index: index to be used for resulting frame
- columns: column labels to use for resulting frame

In [None]:
df = [[0,1,2],[3,4,5],[6,7,8]]
df = pd.DataFrame(df, index=['A', 'B', 'C'], columns=['X', 'Y', 'Z'])

In [None]:
df

Unnamed: 0,X,Y,Z
A,0,1,2
B,3,4,5
C,6,7,8


### to_csv

Save a DataFrame as a csv

<b> Parameters: </b>

- path_or_buf: address where the file is to be saved

In [None]:
file_address = "D:/Sports Analysis/Sports Analytics Course Content/to_csv.csv"
df.to_csv(file_address)

### loc

used to access a group of rows and columns by label(s) or a boolean array

<b> Allowed inputs: </b>

- Index value
- list of indexes
- range of indexes
- column names

In [None]:
print(df.loc['A'])

X    0
Y    1
Z    2
Name: A, dtype: int64


In [None]:
print(df.loc[['A', 'C']])

   X  Y  Z
A  0  1  2
C  6  7  8


In [None]:
print(df.loc['A', 'X'])

0


### iloc

Purely integer-location based indexing for selection by position. The rows are accessed using their index position in dataframe.

<b> Allowed inputs: </b>

- Index's integer position
- list of indexes' position
- range of indexes' position

In [None]:
print(df.iloc[0])

X    0
Y    1
Z    2
Name: A, dtype: int64


In [None]:
print(df.iloc[[0,2]])

   X  Y  Z
A  0  1  2
C  6  7  8


In [None]:
print(df.iloc[0:2])

   X  Y  Z
A  0  1  2
B  3  4  5


### groupby

Group DataFrame using a column or a series of column

<b> Parameter: </b>

- by: used to determine the columns for the groupby
- axis: 0 (grouping along row), 1 (grouping along columns) 

In [None]:
grp = df.groupby(["Z"]).mean()

In [None]:
grp

Unnamed: 0_level_0,X,Y
Z,Unnamed: 1_level_1,Unnamed: 2_level_1
2,0,1
5,3,4
8,6,7


### drop

Drop specified labels from rows or columns.

<b> Parameters: </b>

- labels: Index or column labels to drop
- axis: 0 (along the rows), 1 (along the columns)

In [None]:
df = df.drop("X", axis=1)

In [None]:
df

Unnamed: 0,Y,Z
A,1,2
B,4,5
C,7,8


### to_datetime, to_numeric

#### to_datetime

Convert argument to datetime.

##### Parameters:

- arg: The object to convert to a datetime (can be int, str, datframe, series)

#### to_numeric

Convert argument to a numeric type(by default int64 or float64).

##### Parameters:

- arg: The object to convert to a datetime (can be str, datframe, series)

In [None]:
df["dates"] = ["26-jan-2021", "2-oct-2020", "15-aug-2019"]

In [None]:
df

Unnamed: 0,Y,Z,dates
A,1,2,26-jan-2021
B,4,5,2-oct-2020
C,7,8,15-aug-2019


In [None]:
df["dates"] = pd.to_datetime(df["dates"])

In [None]:
df

Unnamed: 0,Y,Z,dates
A,1,2,2021-01-26
B,4,5,2020-10-02
C,7,8,2019-08-15


In [None]:
df["strng"] = ["16", "5", "99"]

In [None]:
df

Unnamed: 0,Y,Z,dates,strng
A,1,2,2021-01-26,16
B,4,5,2020-10-02,5
C,7,8,2019-08-15,99


In [None]:
df["strng"] = pd.to_numeric(df["strng"])

In [None]:
df

Unnamed: 0,Y,Z,dates,strng
A,1,2,2021-01-26,16
B,4,5,2020-10-02,5
C,7,8,2019-08-15,99


### sort_values

Sort the dataframe by the values along either axis.

<b> Parameters: </b>

- by: Name or list of names to sort by.
- axis: 0 (along row), 1 (along columns)
- ascending: 0(for descending), 1(for ascending)

In [None]:
sorted_df = df.sort_values(by="Y")

In [None]:
sorted_df

Unnamed: 0,Y,Z,dates,strng
A,1,2,2021-01-26,16
B,4,5,2020-10-02,5
C,7,8,2019-08-15,99


### describe, mean, mode, median

<b> Describe: </b> Generate descriptive statistics of a dataframe/series<br>
<b> Mean: </b> Gives mean of a dataframe/columns<br>
<b> Median: </b> Gives median of a dataframe/columns<br>
<b> Mode: </b> Gives mode of a dataframe/columns

In [None]:
df.describe()

Unnamed: 0,Y,Z,strng
count,3.0,3.0,3.0
mean,4.0,5.0,40.0
std,3.0,3.0,51.390661
min,1.0,2.0,5.0
25%,2.5,3.5,10.5
50%,4.0,5.0,16.0
75%,5.5,6.5,57.5
max,7.0,8.0,99.0


In [None]:
df.mean()

Y         4.0
Z         5.0
strng    40.0
dtype: float64

In [None]:
df.median()

Y         4.0
Z         5.0
strng    16.0
dtype: float64

In [None]:
df.mode()

Unnamed: 0,Y,Z,dates,strng
0,1,2,2019-08-15,5
1,4,5,2020-10-02,16
2,7,8,2021-01-26,99
