# Pandas

* [Basics](#Basics)
  * [Creating a dataframe](#dataframe)
  * [Axes and shape](#axes)
  * [Data](#Data)
  * [Access data](#access)
    * [loc](#loc)
    * [iloc](#iloc)
    * [at](#at)
    * [iat](#iat)

In [29]:
import pandas as pd

def init():
    global df
    global df2
    
    df = pd.DataFrame({
        "a": [1, 2, 3],
        "b": ["a", "b", "c"],
        "c": [3.1, 4., 5.71]
    })
    
    df2 = pd.DataFrame({
        "d": [5, 6, 7],
        "e": [8, 9, 10],
        "f": [11, 12, 13]
    })
    
init()
df

Unnamed: 0,a,b,c
0,1,a,3.1
1,2,b,4.0
2,3,c,5.71


In [30]:
df2

Unnamed: 0,d,e,f
0,5,8,11
1,6,9,12
2,7,10,13


## Basics

Dataframe is the basis of most of the pandas operations, it is used to classify data in column, row, or even more than 2 dimensions if needed.

### Creating a dataframe <a id="dataframe" />

You can easily create a new dataframe the previously visible way, you can also precise a few things.

This way, you create the columns, and can precise the index:

In [2]:
df = pd.DataFrame({
    "a": [1, 2, 3],
    "b": [2, 3, 4]
}, index=["a", "b", "c"])
df

Unnamed: 0,a,b
a,1,2
b,2,3
c,3,4


You can also create by row, and precise the columns and indices:

In [3]:
df = pd.DataFrame([
    [1, 2, 3],
    [2, 3, 4]
], index=["a", "b"], columns=["a", "b", "c"])
df

Unnamed: 0,a,b,c
a,1,2,3
b,2,3,4


In [4]:
df = pd.DataFrame([
    pd.Series(["a", "b", "c"], index=['row', 'row2', 'row3']), 
    pd.Series(["a", "c", "d"], index=['row', 'row3', 'row4'])
])
df

Unnamed: 0,row,row2,row3,row4
0,a,b,c,
1,a,,c,d


### Axes and shape <a id="axes" />

You can get various information on a dataframe:

In [5]:
init()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   a       3 non-null      int64  
 1   b       3 non-null      object 
 2   c       3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes


In [6]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [7]:
df.columns

Index(['a', 'b', 'c'], dtype='object')

In [8]:
df.dtypes

a      int64
b     object
c    float64
dtype: object

In [9]:
df.values

array([[1, 'a', 3.1],
       [2, 'b', 4.0],
       [3, 'c', 5.71]], dtype=object)

In [10]:
df.axes

[RangeIndex(start=0, stop=3, step=1), Index(['a', 'b', 'c'], dtype='object')]

In [11]:
df.empty

False

In [12]:
df.size

9

In [13]:
df.shape

(3, 3)

In [14]:
df.ndim

2

In [15]:
df.keys()

Index(['a', 'b', 'c'], dtype='object')

### Data

You can easily access details of the data:

In [16]:
df.describe()

Unnamed: 0,a,c
count,3.0,3.0
mean,2.0,4.27
std,1.0,1.325783
min,1.0,3.1
25%,1.5,3.55
50%,2.0,4.0
75%,2.5,4.855
max,3.0,5.71


In [17]:
df.describe(include="all")

Unnamed: 0,a,b,c
count,3.0,3,3.0
unique,,3,
top,,a,
freq,,1,
mean,2.0,,4.27
std,1.0,,1.325783
min,1.0,,3.1
25%,1.5,,3.55
50%,2.0,,4.0
75%,2.5,,4.855


In [18]:
df.head(1)

Unnamed: 0,a,b,c
0,1,a,3.1


In [19]:
df.tail(1)

Unnamed: 0,a,b,c
2,3,c,5.71


In [20]:
df.sample(1)

Unnamed: 0,a,b,c
2,3,c,5.71


In [21]:
df.get("a")

0    1
1    2
2    3
Name: a, dtype: int64

In [22]:
df.get(["a", "b"])

Unnamed: 0,a,b
0,1,a
1,2,b
2,3,c


In [23]:
df.index = pd.Index([1, 'a', 0])
df

Unnamed: 0,a,b,c
1,1,a,3.1
a,2,b,4.0
0,3,c,5.71


### Access data <a id="access" />

You can access the data through different methods.

These methods can be used both to access the data for reading and writing.

#### loc

In [24]:
print("Simple access")
print(df.loc[0])
print("Access from list")
print(df.loc[['a', 0]])
print("Access from slice")
print(df.loc['a':0])
print(df.loc[1:0])

Simple access
a       3
b       c
c    5.71
Name: 0, dtype: object
Access from list
   a  b     c
a  2  b  4.00
0  3  c  5.71
Access from slice
   a  b     c
a  2  b  4.00
0  3  c  5.71
   a  b     c
1  1  a  3.10
a  2  b  4.00
0  3  c  5.71


In [25]:
print("Access from boolean array")
print(df.loc[[True, False, True]])
print("Access from serie (with matching indices)")
print(df.loc[pd.Series([True, False, True], index=[0, 1, 'a'])])
print("Access from function")
print(df.loc[lambda x: x.a > 1])
print("Access from serie obtained by condition")
print(df.loc[df.a > 1])
print("Access from index")
print(df.loc[pd.Index([0, 1])])

df.loc[pd.Index([0, 1])] = [[3, 'c', 5.71], [2, 'c', 3.2]]

print("After edit")
print(df.loc[pd.Index([0, 1])])

Access from boolean array
   a  b     c
1  1  a  3.10
0  3  c  5.71
Access from serie (with matching indices)
   a  b     c
a  2  b  4.00
0  3  c  5.71
Access from function
   a  b     c
a  2  b  4.00
0  3  c  5.71
Access from serie obtained by condition
   a  b     c
a  2  b  4.00
0  3  c  5.71
Access from index
   a  b     c
0  3  c  5.71
1  1  a  3.10
After edit
   a  b     c
0  3  c  5.71
1  2  c  3.20


#### iloc

While .loc may be index based, .iloc is based on the elements positions in the rows.

In [31]:
init()
df.index = pd.Index([1, 'a', 0])

print("Access from index")
print(df.iloc[0])
print("Access from indices array")
print(df.iloc[[0, 2]])
print("Access from index slice")
print(df.iloc[0:2])
print("Access from boolean list (identical from .loc)")
print(df.iloc[[True, False, True]])

Access from index
a      1
b      a
c    3.1
Name: 1, dtype: object
Access from indices array
   a  b     c
1  1  a  3.10
0  3  c  5.71
Access from index slice
   a  b    c
1  1  a  3.1
a  2  b  4.0
Access from boolean list (identical from .loc)
   a  b     c
1  1  a  3.10
0  3  c  5.71


#### at

.at is used to access a single value, you have to indicate all dimensional information.

In [32]:
print(df.at['a', 'a'])

2


#### iat

.iat is to .at what .iloc is to .loc, a integer-index based access.

In [33]:
print(df.iat[1, 0])

2
