# Topic 1:Data Structure in Pandas
- pandas provides two types of data structures:
- pandas DataFrame (2-dimensional)
- pandas Series (1-dimensional)

# 1. Series: Creating, Manipulating, and Exploring Series
#### A Series in Pandas is a one-dimensional labeled array, capable of holding any data type (integers, strings, floats, etc.).

## 1.1 Creating Series

In [139]:
import pandas as pd

# Creating a Series from a list
data_list = [10, 20, 30, 40, 50]
series_from_list = pd.Series(data_list, name="Numbers")
series_from_list


0    10
1    20
2    30
3    40
4    50
Name: Numbers, dtype: int64

In [140]:
# Creating a Series from a dictionary
data_dict = {'a': 1, 'b': 2, 'c': 3}
series_from_dict = pd.Series(data_dict, name="Alphabet Numbers")
series_from_dict

a    1
b    2
c    3
Name: Alphabet Numbers, dtype: int64

In [141]:
# Creating a Series with custom index
custom_index_series = pd.Series([100, 200, 300], index=['X', 'Y', 'Z'], name="Custom Indexed Series")
custom_index_series

X    100
Y    200
Z    300
Name: Custom Indexed Series, dtype: int64

## 1.3 Manipulating Series

In [142]:
# Accessing elements
print("\nAccessing element at index 2:", series_from_list[2])  # Using positional index
print("Accessing element with label 'b':", series_from_dict['b'])  # Using label-based index


Accessing element at index 2: 30
Accessing element with label 'b': 2


In [143]:
# Basic operations
print("\nSum of elements in series_from_list:", series_from_list.sum())
print("Mean of elements in series_from_list:", series_from_list.mean())


Sum of elements in series_from_list: 150
Mean of elements in series_from_list: 30.0


In [144]:
# Vectorized operations
#Series multiplied by 2-does not changes the original series
series_from_list * 2

0     20
1     40
2     60
3     80
4    100
Name: Numbers, dtype: int64

## 1.3 Exploring Series

In [145]:
# Checking for null values
series_from_list.isnull()

0    False
1    False
2    False
3    False
4    False
Name: Numbers, dtype: bool

In [146]:
# Getting summary statistics
series_from_list.describe()

count     5.000000
mean     30.000000
std      15.811388
min      10.000000
25%      20.000000
50%      30.000000
75%      40.000000
max      50.000000
Name: Numbers, dtype: float64

# 2. DataFrame: Creating DataFrames from Various Data Sources
#### pandas uses data such as CSV or TSV files or a SQL (Structured Query Language) database and turns them into a Python object with rows and columns known as a DataFrame. These objects are quite similar to tables available in statistical software (e.g., Excel or SPSS). Similar to the way Excel works, pandas DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables, as well as to extract valuable information from the given data set

In [147]:
# Creating a DataFrame from a list of lists
data = [[1, 'Navin', 23], [2, 'Bob', 25], [3, 'Charlie', 24]]
df_from_list = pd.DataFrame(data, columns=['ID', 'Name', 'Age'])
df_from_list


Unnamed: 0,ID,Name,Age
0,1,Navin,23
1,2,Bob,25
2,3,Charlie,24


In [148]:
# Creating a DataFrame from a dictionary
data_dict = {
    'Name': ['Navin', 'Chris', 'Frank'],
    'Age': [29, 30, 24],
    'Score': [85, 90, 88]
}
df_from_dict = pd.DataFrame(data_dict)
df_from_dict


Unnamed: 0,Name,Age,Score
0,Navin,29,85
1,Chris,30,90
2,Frank,24,88


In [149]:
# Assuming you have a CSV file named 'office.csv' in your working directory
#Creating a data frame from csv file
df_from_csv = pd.read_csv('office.csv')
df_from_csv


Unnamed: 0,name,marks,city
0,Gaurav,96,Gaya
1,Navin Sir,98,Bengaluru
2,Harsh Bhaiya,85,Jodhpur
3,Sushil,88,Bikaner


# 3. Indexing In Pandas:  Setting, Resetting, and Understanding Index
#### The index of a DataFrame is a series of labels that identify each row. The labels can be integers, strings, or any other hashable type. The index is used for label-based access and alignment, and can be accessed or modified using this attribute. Returns: pandas.Index



### 3.1 Setting an Index

In [150]:
# Setting the 'Name' column as the index
df_with_index = df_from_dict.set_index('Name')
df_with_index


Unnamed: 0_level_0,Age,Score
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Navin,29,85
Chris,30,90
Frank,24,88


### 3.2 Resetting an Index

In [151]:
# Resetting the index back to default integer index
df_reset_index = df_with_index.reset_index()
df_reset_index


Unnamed: 0,Name,Age,Score
0,Navin,29,85
1,Chris,30,90
2,Frank,24,88


### 3.3 Importance of Index in Data Selection and Filtering

In [152]:
# Accessing a row using label-based indexing with loc
#loc and iloc are two primary methods in Pandas for accessing rows and columns in a DataFrame
df_with_index.loc['Navin']

Age      29
Score    85
Name: Navin, dtype: int64

In [153]:
# Accessing a row by position using iloc
#iloc is used to access data by integer location, similar to how you would use indexing with lists.
df_from_list.iloc[0]

ID          1
Name    Navin
Age        23
Name: 0, dtype: object

In [154]:
# Slicing rows based on labels
df_with_index.loc['Navin':'Chris']

Unnamed: 0_level_0,Age,Score
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Navin,29,85
Chris,30,90
