# 🐼 Pandas Basics – Day 3 of Data Science Journey

**Author:** Vidhi Pandey  
**Date:** June 2025  
**Goal:** Learn Series, DataFrames, indexing, importing data, and basic data exploration.


In [1]:
import pandas as pd
import numpy as np

## 📌 What is a Pandas Series?

A Series is a one-dimensional labeled array — similar to a column in Excel. It holds data and an index.


In [2]:
# Create a Series from a list
marks = pd.Series([85, 92, 78, 90], index=["Math", "Physics", "Chemistry", "CS"])
print("Marks Series:")
print(marks)

# Accessing elements
print("\nPhysics Marks:", marks["Physics"])

Marks Series:
Math         85
Physics      92
Chemistry    78
CS           90
dtype: int64

Physics Marks: 92


## 📊 What is a DataFrame?

A DataFrame is a 2D labeled data structure — similar to an Excel sheet or SQL table.  
It has rows and columns, and can hold different data types.


In [5]:
# Create a DataFrame using dictionary
data = {
    "Name": ["Vidhi", "Aryan", "Neha", "Raj"],
    "Age": [24, 22, 23, 25],
    "Score": [88, 92, 85,90]
}

df = pd.DataFrame(data)
print("Student DataFrame:")
print(df)

Student DataFrame:
    Name  Age  Score
0  Vidhi   24     88
1  Aryan   22     92
2   Neha   23     85
3    Raj   25     90


In [6]:
# First 2 rows
print("\nHead (First 2 rows):")
print(df.head(2))

# Info about the DataFrame
print("\nInfo:")
print(df.info())

# Summary statistics
print("\nDescribe:")
print(df.describe())


Head (First 2 rows):
    Name  Age  Score
0  Vidhi   24     88
1  Aryan   22     92

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
 2   Score   4 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 228.0+ bytes
None

Describe:
             Age      Score
count   4.000000   4.000000
mean   23.500000  88.750000
std     1.290994   2.986079
min    22.000000  85.000000
25%    22.750000  87.250000
50%    23.500000  89.000000
75%    24.250000  90.500000
max    25.000000  92.000000


In [7]:
# Accessing a column
print("Name column:\n", df["Name"])

# Filter rows where Score > 85
high_scores = df[df["Score"] > 85]
print("\nStudents with Score > 85:\n", high_scores)

Name column:
 0    Vidhi
1    Aryan
2     Neha
3      Raj
Name: Name, dtype: object

Students with Score > 85:
     Name  Age  Score
0  Vidhi   24     88
1  Aryan   22     92
3    Raj   25     90


In [8]:
# Add new column
df["Pass"] = df["Score"] >= 80
print("\nWith Pass column:\n", df)

# Update Age of a student
df.loc[df["Name"] == "Vidhi", "Age"] = 25

# Drop a column
df = df.drop("Pass", axis=1)
print("\nAfter updating and removing 'Pass' column:\n", df)



With Pass column:
     Name  Age  Score  Pass
0  Vidhi   24     88  True
1  Aryan   22     92  True
2   Neha   23     85  True
3    Raj   25     90  True

After updating and removing 'Pass' column:
     Name  Age  Score
0  Vidhi   25     88
1  Aryan   22     92
2   Neha   23     85
3    Raj   25     90


In [9]:
# Sample dataset from a URL
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"

iris = pd.read_csv(url)
print("Iris Dataset Head:")
print(iris.head())

Iris Dataset Head:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


In [25]:
import numpy as np
import pandas as pd

# Data
data = {
    "Name": ["Vidhi", "Aryan", "Neha", "Raj", "Tanya", "Manav"],
    "Age": [24, 23, 22, 25, 21, 22],
    "Subject": ["Math", "Physics", "Chemistry", "Math", "CS", "Biology"],
    "Marks": [88, 79, 91, 85, 95, 67]
}

df = pd.DataFrame(data)

# Display first 3 rows
print("\nHead (First 3 rows):")
print(df.head(3))

# Info
print("\nInfo:")
df.info()  # Corrected

# Describe
print("\nDescribe:")
print(df.describe())

# Grade assignment
conditions = [
    (df["Marks"] >= 90),
    (df["Marks"] >= 80) & (df["Marks"] < 90),
    (df["Marks"] >= 70) & (df["Marks"] < 80),
    (df["Marks"] < 70)
]
grades = ["A", "B", "C", "D"]
df["Grade"] = np.select(conditions, grades, default="NA")

# Final output
print("\nWith Grade column:\n", df)
math_students=df[(df["Subject"]=="Math") &( df["Marks"]>85)]
print("\n Math students with marks > 85 are: \n", math_students)
df.loc[df["Name"] == "Manav", "Subject"] = "Botany"

print("\n After updating and removing 'Age' column:\n", df)
df.to_csv("students_data.csv", index=False)





Head (First 3 rows):
    Name  Age    Subject  Marks
0  Vidhi   24       Math     88
1  Aryan   23    Physics     79
2   Neha   22  Chemistry     91

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Name     6 non-null      object
 1   Age      6 non-null      int64 
 2   Subject  6 non-null      object
 3   Marks    6 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 324.0+ bytes

Describe:
             Age      Marks
count   6.000000   6.000000
mean   22.833333  84.166667
std     1.471960  10.008330
min    21.000000  67.000000
25%    22.000000  80.500000
50%    22.500000  86.500000
75%    23.750000  90.250000
max    25.000000  95.000000

With Grade column:
     Name  Age    Subject  Marks Grade
0  Vidhi   24       Math     88     B
1  Aryan   23    Physics     79     C
2   Neha   22  Chemistry     91     A
3    Raj   25       Math     8

In [26]:
new_df = pd.read_csv("students_data.csv")
print(new_df)

    Name  Age    Subject  Marks Grade
0  Vidhi   24       Math     88     B
1  Aryan   23    Physics     79     C
2   Neha   22  Chemistry     91     A
3    Raj   25       Math     85     B
4  Tanya   21         CS     95     A
5  Manav   22     Botany     67     D
