# Pandas :
### 1. What is Pandas?
#### Python library used for working with data sets.
#### It has functions for analyzing, cleaning, exploring, and manipulating data.
#### The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" .

### 2. Why Use Pandas?
#### Pandas allows us to analyze big data and make conclusions based on statistical theories.
#### Pandas can clean messy data sets, and make them readable and relevant.

# 
## Installing Pandas:
#### terminal : " pip install pandas "   JupyterNB : " !pip install pandas "

# 
## Importing Pandas

In [3]:
import pandas as pd

# 
## 3. Basic Data Structures
#### 3.1 Series
#### Creating a Series

In [4]:
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)

a    10
b    20
c    30
d    40
dtype: int64


In [13]:
#Accessing elements
print(series['b'])

print(series['d'])

20
40


# 
### 3.2 DataFrame
#### Creating a DataFrame

In [6]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Score': [90, 85, 88]
}
df = pd.DataFrame(data)
print(df)

      Name  Age  Score
0    Alice   25     90
1      Bob   30     85
2  Charlie   35     88


In [11]:
#Accessing Rows

print(df['Name']) #accessing 'Name' column 

print(df['Score']) #accessing 'Score' column 

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object
0    90
1    85
2    88
Name: Score, dtype: int64


In [12]:
#Accessing Columns

print(df.loc[0]) #accessing first Row

print(df.loc[2]) #accessing 3rd Row

Name     Alice
Age         25
Score       90
Name: 0, dtype: object
Name     Charlie
Age           35
Score         88
Name: 2, dtype: object


# 
## 4. Basic Operations
##
### 4.1 Viewing Data

In [15]:
print(df.head())  # First 5 rows

      Name  Age  Score
0    Alice   25     90
1      Bob   30     85
2  Charlie   35     88


In [16]:
print(df.tail())  # Last 5 rows

      Name  Age  Score
0    Alice   25     90
1      Bob   30     85
2  Charlie   35     88


In [17]:
print(df.info())  # Data summary

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   Score   3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes
None


In [18]:
print(df.describe())  # Statistical summary

        Age      Score
count   3.0   3.000000
mean   30.0  87.666667
std     5.0   2.516611
min    25.0  85.000000
25%    27.5  86.500000
50%    30.0  88.000000
75%    32.5  89.000000
max    35.0  90.000000


# 
### 4.2 Adding/Modifying Columns

In [19]:
df['Passed'] = df['Score'] > 85
print(df)

      Name  Age  Score  Passed
0    Alice   25     90    True
1      Bob   30     85   False
2  Charlie   35     88    True


#
### 4.3 Dropping Columns/Rows

In [20]:
df = df.drop(columns=['Passed'])
df = df.drop(index=0)
print(df)

      Name  Age  Score
1      Bob   30     85
2  Charlie   35     88


# 
## 5. Data Selection
#
### 5.1 Indexing

In [21]:
print(df.iloc[1])  # By position

Name     Charlie
Age           35
Score         88
Name: 2, dtype: object


In [22]:
print(df.loc[1])   # By label

Name     Bob
Age       30
Score     85
Name: 1, dtype: object


#
### 5.2 Filtering

In [23]:
print(df[df['Age'] > 28])

      Name  Age  Score
1      Bob   30     85
2  Charlie   35     88


In [24]:
print(df[df['Name'] == "Bob"])

  Name  Age  Score
1  Bob   30     85


# 
## 6. Handling Missing Data
#
### 6.1 Detecting Missing Values

In [25]:
df.loc[3] = [None, 28, None]
print(df.isnull())

    Name    Age  Score
1  False  False  False
2  False  False  False
3   True  False   True


# 
### 6.2 Filling Missing Values

In [26]:
df['Score'].fillna(df['Score'].mean(), inplace=True)

# 
### 6.3 Dropping Missing Values

In [27]:
df.dropna(inplace=True)

#
## 7. Operations on DataFrames
#
### 7.1 Sorting

In [28]:
print(df.sort_values(by='Age'))

      Name   Age  Score
1      Bob  30.0   85.0
2  Charlie  35.0   88.0


#
### 7.2 Grouping

In [34]:
grouped = df.groupby('Age')
print(grouped.mean())

AttributeError: 'DataFrameGroupBy' object has no attribute 'select_dtypes'

#
### 7.3 Merging and Joining

In [35]:
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [90, 85]})
merged = pd.merge(df1, df2, on='ID')
print(merged)

   ID   Name  Score
0   1  Alice     90
1   2    Bob     85


#
## 8. Data Cleaning
#
### Renaming Columns

In [38]:
df.rename(columns={'Name': 'Student Name'}, inplace=True)
print(df)

  Student Name  Age  Score
0        Alice   25     90
1          Bob   30     85
2      Charlie   25     88
3        David   30     92


#
### Dropping Duplicates

In [39]:
df.drop_duplicates(inplace=True)
print(df)

  Student Name  Age  Score
0        Alice   25     90
1          Bob   30     85
2      Charlie   25     88
3        David   30     92


#
## 9. Working with Dates
#
### Converting to datetime

In [41]:
df['Date'] = pd.to_datetime(['2024-01-01', '2024-02-01', '2024-03-01', '2020-03-05'])
print(df)

  Student Name  Age  Score       Date
0        Alice   25     90 2024-01-01
1          Bob   30     85 2024-02-01
2      Charlie   25     88 2024-03-01
3        David   30     92 2020-03-05


#
### Extracting components

In [44]:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day
print(df)

  Student Name  Age  Score       Date  Year  Month  ay  day
0        Alice   25     90 2024-01-01  2024      1   1    1
1          Bob   30     85 2024-02-01  2024      2   1    1
2      Charlie   25     88 2024-03-01  2024      3   1    1
3        David   30     92 2020-03-05  2020      3   5    5


#
## 10. File Handling
#
### 10.1 Reading Files

In [45]:
#CSV files
df = pd.read_csv('100 Sales Records.csv')

In [46]:
print(df)

                               Region                Country        Item Type  \
0               Australia and Oceania                 Tuvalu        Baby Food   
1   Central America and the Caribbean                Grenada           Cereal   
2                              Europe                 Russia  Office Supplies   
3                  Sub-Saharan Africa  Sao Tome and Principe           Fruits   
4                  Sub-Saharan Africa                 Rwanda  Office Supplies   
..                                ...                    ...              ...   
95                 Sub-Saharan Africa                   Mali          Clothes   
96                               Asia               Malaysia           Fruits   
97                 Sub-Saharan Africa           Sierra Leone       Vegetables   
98                      North America                 Mexico    Personal Care   
99                 Sub-Saharan Africa             Mozambique        Household   

   Sales Channel Order Prio

In [47]:
# EXCEL
df = pd.read_excel('business-operations-survey-2023-CSV-notes.xlsx')

In [48]:
print(df)

                                            Footnotes  \
0                                                   1   
1                                                   2   
2                                                   3   
3                                                   4   
4                                                   5   
5                                                   6   
6                                                   7   
7                                                   8   
8                                                   9   
9                                                  10   
10                                                 11   
11                                                 12   
12                                            Symbols   
13                                                  .   
14                                                  C   
15                                                  R   
16                             

#
### 10.2 Writing Files

In [50]:
# CSV and EXCEL

# Example DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Score': [90, 85, 88]
}
df = pd.DataFrame(data)

# Save as CSV
df.to_csv('output.csv', index=False)

# Save as Excel
df.to_excel('output.xlsx', index=False)


In [51]:
df = pd.read_csv('output.csv')
print(df)

      Name  Age  Score
0    Alice   25     90
1      Bob   30     85
2  Charlie   35     88


In [52]:
df = pd.read_excel('output.xlsx')
print(df)

      Name  Age  Score
0    Alice   25     90
1      Bob   30     85
2  Charlie   35     88
