# Pandas


Pandas is a data manipulation library in python, used for data analysis and cleaning.

It provides two primary structures:

- **Series**: one-dimension array like object
- **DataFrames**: two-dimensional, size-mutable tabular data structure


## Install packages


In [4]:
!uv pip install -q \
    pandas==2.3.2 \
    pandas-stubs==2.3.2.250827


## Import packages


In [None]:
import pandas as pd

## Usage


### Series


#### Create series from List


In [None]:
data1 = [1, 2, 3, 4, 5]
series1 = pd.Series(data1)
print(f"Series:\n{series1}")
print(type(series1))

Series:
0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


#### Create Series from Dict


In [None]:
data2 = {"a": 1, "b": 2}
series2 = pd.Series(data2)
print(f"Series:\n{series2}")

Series:
a    1
b    2
dtype: int64


#### Working with Series indexes


In [None]:
data3 = [10, 20, 30]
index3 = ["a", "b", "c"]
series3 = pd.Series(data3, index=index3)
print(f"Series:\n{series3}")

Series:
a    10
b    20
c    30
dtype: int64


### Data frames


#### Loading Data


##### Data frame from a dictionary of lists


In [None]:
data4 = {
    "Name": ["Pedro", "James", "John"],
    "Age": [33, 27, 52],
}

dataframe4 = pd.DataFrame(data4)
dataframe4

Unnamed: 0,Name,Age
0,Pedro,33
1,James,27
2,John,52


##### Data frame from a list of dicts


In [None]:
data5 = [
    {"Name": "Pedro", "Age": 33},
    {"Name": "James", "Age": 27},
    {"Name": "John", "Age": 52},
]

dataframe5 = pd.DataFrame(data5)
dataframe5

Unnamed: 0,Name,Age
0,Pedro,33
1,James,27
2,John,52


#### Dataframe attributes


In [None]:
dataframe5.dtypes

Name    object
Age      int64
dtype: object

#### Dataframe methods


In [None]:
data6 = [
    {"Name": "Pedro", "Age": 33},
    {"Name": "James", "Age": 27},
    {"Name": "John", "Age": 52},
]

dataframe6 = pd.DataFrame(data6)
dataframe6.head(2)

Unnamed: 0,Name,Age
0,Pedro,33
1,James,27


In [None]:
dataframe6.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes


In [None]:
dataframe6.describe()

Unnamed: 0,Age
count,3.0
mean,37.333333
std,13.051181
min,27.0
25%,30.0
50%,33.0
75%,42.5
max,52.0


In [None]:
dataframe6.tail(2)

Unnamed: 0,Name,Age
1,James,27
2,John,52


##### Accessing a Series


In [None]:
print(dataframe6["Name"])
print(type(dataframe6["Name"]))

0    Pedro
1    James
2     John
Name: Name, dtype: object
<class 'pandas.core.series.Series'>


##### Accessing an row index


In [None]:
dataframe6.loc[0]

Name    Pedro
Age        33
Name: 0, dtype: object

##### Accessing a particular value


In [None]:
dataframe6.at[1, "Age"]

np.int64(27)

In [None]:
dataframe6.iat[2, 1]

np.int64(52)

#### Dataframe data manipulation


In [None]:
data7 = [
    {"Name": "Pedro", "Age": 33},
    {"Name": "James", "Age": 27},
    {"Name": "John", "Age": 52},
]

dataframe7 = pd.DataFrame(data7)
dataframe7.head(2)

Unnamed: 0,Name,Age
0,Pedro,33
1,James,27


##### Adding a new column


In [None]:
dataframe7["City"] = ["New York", "Florida", "Los Angeles"]
dataframe7

Unnamed: 0,Name,Age,City
0,Pedro,33,New York
1,James,27,Florida
2,John,52,Los Angeles


##### Removing a column


In [None]:
dataframe7.drop("City", axis=1)  # axis=0 rows, axis=1 columns

Unnamed: 0,Name,Age
0,Pedro,33
1,James,27
2,John,52


##### Removing a row


In [None]:
dataframe7.drop(1, axis=0)

Unnamed: 0,Name,Age,City
0,Pedro,33,New York
2,John,52,Los Angeles


##### Modify column value


In [None]:
dataframe7["Age"] = dataframe7["Age"] + 1
dataframe7

Unnamed: 0,Name,Age,City
0,Pedro,34,New York
1,James,28,Florida
2,John,53,Los Angeles
