# INTRODUCTION TO PANDAS

### PART I



pandasis a Python package providing fast, flexible, and expressive data structures designed to make working
with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block
for doing practical, real world data analysis in Python.

https://pandas.pydata.org/pandas-docs/stable/

In [None]:
import numpy  as np
import pandas as pd

## Data Structures


#### 1 - SERIES (1-Dimensional labeled homogeneously-typed array)
Series data structures are value-mutable, but the length cannot be changed.

#### 2 - DATAFRAME (General 2-Dimensional labeled, size-mutable tabular structure with potentially heterogeneously-typed column)
<br>
A DataFrame is similar to a sheet with rows and columns, while a Series is similar to a single column of data.

# 1 - SERIES (1-Dimensional labeled homogeneously-typed array)

In [None]:
a = [1, 2, 'AA', np.nan, '2019-07-01', 3]
s = pd.Series(a)

In [None]:
print(s)
type(s)

In [None]:
s[2]

##### > Custom index names

In [None]:
index = ['A', 'B', 'C', 'D', 'date', 'E']
s = pd.Series(a, index)
s

In [None]:
s['date']

# 2 - DATAFRAME (General 2-Dimensional labeled, size-mutable tabular structure with potentially heterogeneously-typed column)
index (the rows)<br>
columns

## 2.1 Creating a DataFrame with a date time index, labeled columns and random values

##### > 2.1.1 Creating a range of dates

In [None]:
datelist = pd.date_range(start ='2019/7/10', periods = 10, freq = 'D')
datelist = datelist.strftime('%d-%m-%Y')
print(datelist)

##### > DataFrame

In [None]:
df = pd.DataFrame(np.random.randn(10,5), index = datelist, columns = ['A', 'B', 'C', 'D', 'E'])
df

##### > Exploring the data

In [None]:
df.dtypes

In [None]:
df.info()

In [None]:
df.shape

## 2.2 Viewing Data

In [None]:
df.head()

In [None]:
df.head(10)

In [None]:
df.tail()

##### > Display index (rows) and columns

In [None]:
df.index

In [None]:
df.columns

## 2.3 Selecting Data
  iloc: Purely integer-location based indexing for selection by position <br>
  loc : Purely label-location based indexer for selection by label <br>
  iat : Fast integer location scalar accessor <br>
   at : Access a single value using a label <br>



### 2.3.1 Selecting Columns

#####  > Single column by label (and getting a Serie)

In [None]:
df['A']

In [None]:
type(df['A'].values)

##### > Single column by label (and getting a DataFrame)

In [None]:
df[['A']]

In [None]:
type(df[['A']].values)

##### > Selecting more than one column by label (and getting a DataFrame)

In [None]:
df[['A' ,'B']]

##### > Selecting a single column by attribute

In [None]:
df.A

### 2.3.2 Selecting Rows 

##### > Rows by index

In [None]:
df[0:2]

##### > Rows by label

In [None]:
df['09072019':'13072019']

### 2.3.3 ILOC. Selecting by  Position (similar to  numpy/python)
df.iloc[< row # >, < column #>]

##### > Selecting a row by index (and getting a Serie)

In [None]:
df.iloc[0]

##### > Selecting a row by index (and getting a DataFrame)

In [None]:
df.iloc[[0]]

##### > Selecting a column by index (and getting a Serie)

In [None]:
df.iloc[:,2]

##### > Selecting a column by index (and getting a DataFrame)

In [None]:
df.iloc[:,[2]]

##### > Selecting columns by index

In [None]:
df.iloc[:, [0, 2, 4]]

##### > Selecting rows by index (slicing)

In [None]:
df.iloc[0:3, :]

##### > Selecting rows and columns by index (slicing)

In [None]:
df.iloc[0:3, 2:4]

In [None]:
range_A = np.r_[0:2, -2:0] # numpy.r_ Translates slice objects to concatenation along the first axis.
print(range_A)

In [None]:
df.iloc[range_A]

##### > Selecting rows and columns by index

In [None]:
df.iloc[[0, 2, 3, 5, 7], [2, 4]]

### 2.3.4 LOC : Purely label-location based indexer for selection by label

##### > Selecting a column by label (and getting a Serie)

In [None]:
df.loc[:, 'A']

##### > Selecting a column by label (and getting a DataFrame)

In [None]:
df.loc[:, ['A']]

##### > Selecting columns by label

In [None]:
df.loc[:, ['A', 'B']]

##### > Selecting a row by label (and getting a Serie)

In [None]:
df.loc['10-07-2019']

##### > Selecting a row by label (and getting a DataFrame)

In [None]:
df.loc[['10-07-2019']]

##### > Selecting rows by label

In [None]:
df.loc[['10-07-2019', '13-07-2019']]

##### > Selecting rows by label (slicing)

In [None]:
df.loc['10-07-2019': '13-07-2019']

#### > Selecting rows and columns by label

In [None]:
df.loc[['10-07-2019', '13-07-2019'], ['A','C']]

##### > Selecting rows and columns by label (slicing)

In [None]:
df.loc['10-07-2019': '13-07-2019', 'A':'C']

### 2.3.5 iat : Fast integer location scalar accessor & at : Access a single value using a label

##### > Get value at specified row/column pair (index)

In [None]:
df.iat[1,1]

##### > Get value at specified row/column pair (label)

In [None]:
df.at['10-07-2019','B']

In [None]:
df.at[pd.to_datetime('07/10/2019').strftime('%d-%m-%Y'), 'B']

## 2.4 Adding new records to the data frame

### 2.4.1 Additional columns

In [None]:
df['F'] = None
df

### 2.4.2 Setting values

##### > Entire column

In [None]:
df.iloc[:, 5] = 1
df

##### > Entrire row

In [None]:
df.iloc[0:1, :] = 0
df

##### > Single value

In [None]:
df.iloc[2,1] = 13
df