#### Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [2]:
import pandas as pd
# polars - rust ->

"""
divide your data into 3 categories
1. Structured - Tabular, RDBMS, CSV, TSV, Excel
2. Semi-structured - JSON, XML, YAML
3. Un-structured - Video, Audio etc..


"""


'\ndivide your data into 3 categories\n1. Structured - Tabular, RDBMS, CSV, TSV, Excel\n2. Semi-structured - JSON, XML, YAML\n3. Un-structured - Video, Audio etc..\n\n\n'

In [3]:
print(pd.__version__)

2.2.3


## Core Data Structures

![image.png](attachment:image.png)

### Series

A Series is a one-dimensional labeled array capable of holding any data type. It is similar to a column in a spreadsheet or a single variable in statistics.



In [4]:


data=[1,2,3,4,5]


series=pd.Series(data)
print("Series \n",series)
print(type(series))

Series 
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [11]:
series.index

RangeIndex(start=0, stop=5, step=1)

In [4]:
import numpy as np

In [None]:
# 
arr = np.array([22, 35, 58, 42, 31])
pd.Series(arr)

0    22
1    35
2    58
3    42
4    31
dtype: int32

In [5]:
# Creating a Series from a list
# think of series as a list with index

ages = pd.Series([22, 35, 58, 42, 31],
                 index=['n1', 'n2', 'n3', 'n4', 'n4'])
print(ages)

n1    22
n2    35
n3    58
n4    42
n4    31
dtype: int64


#### Labels
If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.

This label can be used to access a specified value.

In [6]:
ages['n1']

22

In [9]:
# Return the first value of the Series:

# print(ages[0])
print(ages.iloc[3])


42


In [10]:
## Create a Series from dictionary
# Dictionary to Series
student_scores = {'Alice': 85, 'Bob': 92, 'Charlie': 78, 'David': 88}
scores = pd.Series(student_scores)
print(scores)

Alice      85
Bob        92
Charlie    78
David      88
dtype: int64


In [13]:

scores.index

Index(['Alice', 'Bob', 'Charlie', 'David'], dtype='object')

In [None]:
print(scores[scores > 85]) # filtering

Bob      92
David    88
dtype: int64


In [12]:
# Statistical methods
print(scores.mean())  # Average score
print(scores.describe())  # Summary statistics

85.75
count     4.000000
mean     85.750000
std       5.909033
min      78.000000
25%      83.250000
50%      86.500000
75%      89.000000
max      92.000000
dtype: float64


### DataFrame

A DataFrame is a 2-dimensional labeled data structure with columns that can be of different types. It's similar to a spreadsheet, SQL table, or a dictionary of Series objects.

#### Creating a DataFrame from a dictionary:

In [13]:
## Dataframe
## create a Dataframe from a dictionary oof list
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [24, 27, 22, 32, 29],
    'City': ['New York', 'Boston', 'Chicago', 'Seattle', 'San Francisco'],
    'Salary': [65000, 72000, 59000, 82000, 75000]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City,Salary
0,Alice,24,New York,65000
1,Bob,27,Boston,72000
2,Charlie,22,Chicago,59000
3,David,32,Seattle,82000
4,Eva,29,San Francisco,75000


In [14]:
## Create a Data frame From a List of Dictionaries

employees = [
    {'Name': 'Alice', 'Department': 'HR', 'Hire Date': '2020-05-15'},
    {'Name': 'Bob', 'Department': 'Engineering', 'Hire Date': '2019-11-20'},
    {'Name': 'Charlie', 'Department': 'Marketing', 'Hire Date': '2021-02-10'}
]

employee_df = pd.DataFrame(employees)
print(employee_df)

      Name   Department   Hire Date
0    Alice           HR  2020-05-15
1      Bob  Engineering  2019-11-20
2  Charlie    Marketing  2021-02-10


In [15]:
employee_df

Unnamed: 0,Name,Department,Hire Date
0,Alice,HR,2020-05-15
1,Bob,Engineering,2019-11-20
2,Charlie,Marketing,2021-02-10
