## Pandas Intro
Pandas is an open-source data analysis and manipulation library for Python. It provides data structures and functions needed to work with structured data seamlessly.  
The two primary data structures in Pandas are Series (1-dimensional) and DataFrame (2-dimensional).

In [None]:
# To install Pandas, use the following command:
# pip install pandas

In [1]:
# Importing Pandas
import pandas as pd

In [2]:
# A Series is a one-dimensional labeled array capable of holding any data type.
# Creating a simple Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print(series)
# The index of a Series can be explicitly defined using the index parameter.
# Series are similar to NumPy arrays, but they have labels.

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [3]:
# A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
# A dictionary
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 24, 35, 32],
    'City': ['New York', 'Paris', 'Berlin', 'London']
}
# Creating a simple DataFrame
df = pd.DataFrame(data)
print(df)
# A DataFrame is similar to a table in a relational database.
# It can be created from a dictionary, list, or another DataFrame.

    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin
3  Linda   32    London


In [4]:
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
# Series can also be created from dictionaries.
# Operations on Series are typically element-wise.
print(series)

a    1
b    2
c    3
d    4
e    5
dtype: int64


### Pandas DataFrame

In [13]:
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 24, 35, 32],
    'City': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

print(df)
print("--------------")
# Selecting a column
print("Name Column:")
print(df['Name'])
print("--------------")

# Selecting multiple columns
print("Two Columns:")
print(df[['Name', 'City']])
print("--------------")

# Selecting a row by label
print("Second row:")
print(df.loc[1])
print("--------------")
# Selecting a row by integer location
print("Selecting a row by integer location:")
print(df.iloc[2])

# Use loc to select by label and iloc to select by integer location.
# You can also use slicing and conditional selection.

    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin
3  Linda   32    London
--------------
Name Column:
0     John
1     Anna
2    Peter
3    Linda
Name: Name, dtype: object
--------------
Two Columns:
    Name      City
0   John  New York
1   Anna     Paris
2  Peter    Berlin
3  Linda    London
--------------
Second row:
Name     Anna
Age        24
City    Paris
Name: 1, dtype: object
--------------
Selecting a row by integer location:
Name     Peter
Age         35
City    Berlin
Name: 2, dtype: object


### Pandas reading files

In [None]:
# Reading a CSV file
df = pd.read_csv('address/nameOfTheDataset.csv')

print(df.head())  # Print the first 5 rows of the DataFrame

# The read_csv function is highly customizable with parameters like sep, header, names, index_col, etc.
# Use df.head() to quickly inspect the first few rows of a DataFrame.

In [None]:
# Reading a JSON file
df = pd.read_json('address/nameOfTheDataset.json')

print(df.head())  # Print the first 5 rows of the DataFrame

# The read_json function can parse JSON strings or files into DataFrames.
# JSON data should be in a format that Pandas can convert directly into a table-like structure.

### Pandas Analyzing Data

In [17]:
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 24, 35, 32],
    'City': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

# Descriptive statistics
print(df.describe())
print("----------------------")
# Mean of a column
print("Mean value of Age column:", df['Age'].mean())
print("----------------------")
# Filtering data
print("Persons older than 30 years old:")
print(df[df['Age'] > 30])

             Age
count   4.000000
mean   29.750000
std     4.787136
min    24.000000
25%    27.000000
50%    30.000000
75%    32.750000
max    35.000000
----------------------
Mean value of Age column: 29.75
----------------------
Persons older than 30 years old:
    Name  Age    City
2  Peter   35  Berlin
3  Linda   32  London
