![alt text](assets/Pandas_logo.png)


# pandas: Basics

This notebook contains code examples as a Introduction into Pandas

The documentation of this package can be found here: https://pandas.pydata.org/docs/

## DataFrame

Is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

Doc: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

## First Steps

### Importing pandas
The next cell will import the pandas package and will set the max number of rows to display

Doc: https://pandas.pydata.org/docs/user_guide

In [None]:
import pandas as pd
pd.options.display.max_rows = 10
pd.options.display.min_rows = None

### Reading a csv
Now, you will import a csv file into a dataframe. There are several options for this function, you can check this doc for reference:  https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv

In [None]:
titanic = pd.read_csv("titanic.csv")
titanic

In [None]:
type(titanic)

In [None]:
titanic_sep = pd.read_csv("titanic.csv", sep='|', header=None, na_values="other_null")
titanic_sep

### Writing a csv
`to_csv` allows you to write a DataFrame into a csv file, there a plenty options you can leverage on for this process: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

In [None]:
titanic.to_csv('titanic_from_df.csv')
titanic.to_csv('titanic_from_df.csv', sep='|', header=False, index=False, na_rep="NAN")

## Basic DF Functions, Attributes and Methods

In [None]:
titanic.head(2)

In [None]:
titanic.tail(2)

In [None]:
titanic.columns

In [None]:
titanic.index

In [None]:
titanic.info()

In [None]:
titanic.dtypes

In [None]:
titanic.describe()

In [None]:
len(titanic)

In [None]:
round(titanic, 0).head()

In [None]:
titanic.size

## pandas.Series

One-dimensional ndarray with axis labels (including time series).

Doc: https://pandas.pydata.org/docs/reference/api/pandas.Series.html?highlight=series#pandas.Series

In [None]:
titanic.head()

In [None]:
titanic["age"]

In [None]:
type(titanic["age"])

In [None]:
titanic[["age"]]

In [None]:
type(titanic[["age"]])

In [None]:
# titanic["age", "sex"]

In [None]:
titanic[["age", "sex"]]

In [None]:
titanic[["sex", "age", "fare"]]

## Creating series and DFs

Each component of a series has a unique identification thanks to an index. It is possible to create new Series or DataFrames by using lists, arrays, dictionaries, and existing Series objects

In [None]:
data = [1000, 2000, 3000, 4000, 5000]
s = pd.Series(data)
print(s)

In [None]:
data = [1000, 2000, 3000, 4000, 5000]
df = pd.DataFrame(data, columns=['Column1'])
print(df)

In [None]:
titanic.age

In [None]:
titanic.age.equals(titanic["age"])

### Selecting Rows with Square Brackets (not advisable)

In [None]:
titanic.head()

In [None]:
titanic[0:1]

In [None]:
titanic[4:8]

In [None]:
titanic[:10]

In [None]:
titanic[-10:]

### Indexing Operator iloc (location based indexing) 

#### Selecting Rows with iloc

`.iloc[]` is an integer-location based indexing for selection by position. Is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

In [None]:
titanic.iloc[0]

In [None]:
type(titanic.iloc[0])

In [None]:
titanic.iloc[-1]

In [None]:
titanic.iloc[:5]

In [None]:
titanic.iloc[-5:]

In [None]:
titanic.iloc[456:459]

In [None]:
titanic.iloc[[2,45,765]]

In [None]:
titanic.iloc[0,0:3]

In [None]:
titanic.iloc[:,[0,2,6,8]]

In [None]:
titanic.iloc[0,[0,2,6,8]]

In [None]:
titanic.iloc[34:39,[0,2,6,8]]

#### Selecting Columns with iloc

In [None]:
titanic.iloc[:, 0].equals(titanic.survived)

In [None]:
titanic["survived"]

### Index Operator loc (label based indexing)

In [None]:
medals = pd.read_csv("summer.csv", index_col="Athlete")

medals_wo_index = pd.read_csv("summer.csv")

In [None]:
medals_wo_index.head()

In [None]:
medals.head()

#### Selecting Rows with loc

With `.loc[]` you can ccess a group of rows and columns by label(s) or a boolean array. Is primarily label based, but may also be used with a boolean array.

In [None]:
medals.loc["DRIVAS, Dimitrios"]

In [None]:
medals.loc["PHELPS, Michael", "Medal"]

In [None]:
medals.loc["PHELPS, Michael"].iloc[0]

#### Slicing Rows and Columns with loc

In [None]:
medals.loc["PHELPS, Michael", ["Event","Medal"]]

In [None]:
medals.loc[["PHELPS, Michael", "LEWIS, Carl"], ["Event","Medal"]]

In [None]:
medals.loc["DRIVAS, Dimitrios":"BLAKE, Arthur"]

In [None]:
medals.loc["HAJOS, Alfred", "Year":"Discipline"]

## Slicing errors

In case a label or column is not found, the `loc[]` method will raise errors

In [None]:
medals.loc["PHELPS, Michael", ["Year", "Age"]]

In [None]:
medals.loc["Other", ["Year", "City"]]


# BONUS!

Check the [iloc](pandas-iloc.pdf) and [loc](pandas-loc.pdf)  cheat sheet included!