## Introduction to pandas: Series, DataFrame, data manipulation


## Pandas Overview

#### Pandas is a powerful Python library for data manipulation and analysis. It provides two primary data structures: Series and DataFrame.

### Installation: You can install Pandas using pip:
#### pip install pandas

### Series

#### Series is a one-dimensional labeled array capable of holding any data type. It is similar to a one-dimensional NumPy array but with additional functionality and a labeled index.



In [1]:
#Code Example:
import pandas as pd

# Creating a Series from a Python list
s = pd.Series([1, 2, 3, 4, 5])


## DataFrame

#### DataFrame is a two-dimensional labeled data structure with columns of potentially different data types. It is similar to a #### spreadsheet or SQL table.


In [2]:
#Code Example:
# Creating a DataFrame from a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 35],
        'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)
print(df)


    Name  Age      City
0   John   30  New York
1  Alice   25     Paris
2    Bob   35    London


## Data Manipulation

#### Indexing and Slicing: You can access and manipulate data in Series and DataFrame using indexing and slicing.

In [3]:

# Accessing elements in a Series
print(s[0])  # Accessing the first element

# Slicing a DataFrame
print(df['Name'])  # Accessing a column


1
0     John
1    Alice
2      Bob
Name: Name, dtype: object


### Data Combination

#### Data combination is the process of joining, merging or concatenating multiple pandas data structure into a single data structure.
#### The most common ways of combine dataframes are concatenation, joining and merge

### Concatenation

#### Using the `concat` function to combine two or more dataframes with the same columns either vertically or horizontally.

In [4]:
df_1 = pd.DataFrame({'A': [4, 2, 6], 'B': ['m', 'n', 'o']})
df_2 = pd.DataFrame({'A': [45, 8, 9], 'B': ['x', 'y', 'z']})

In [5]:
df_1

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [6]:
df_2

Unnamed: 0,A,B
0,45,x
1,8,y
2,9,z


In [7]:
# Vertical concatenation (row-wise)
vertical_com = pd.concat([df_1, df_2], axis=0, keys=["df_1", "df_2"]) 
vertical_com

Unnamed: 0,Unnamed: 1,A,B
df_1,0,4,m
df_1,1,2,n
df_1,2,6,o
df_2,0,45,x
df_2,1,8,y
df_2,2,9,z


In [8]:
vertical_com.loc["df_1"]

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [9]:
vertical_com.loc["df_1",0]

A    4
B    m
Name: (df_1, 0), dtype: object

In [10]:
vertical_com.loc["df_1",0][0]

4

In [11]:
vertical_com.loc["df_1",0]["A":"B"]

A    4
B    m
Name: (df_1, 0), dtype: object

In [12]:
# Vertical concatenation (row-wise)
vertical_com = pd.concat([df_1, df_2], axis=0, ignore_index=True) 
vertical_com

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o
3,45,x
4,8,y
5,9,z


In [13]:
# Horizontal concatenation (column-wise)
horizontal_com = pd.concat([df_1, df_2], axis=1, keys=["df_1","df_2"]) 
horizontal_com

Unnamed: 0_level_0,df_1,df_1,df_2,df_2
Unnamed: 0_level_1,A,B,A,B
0,4,m,45,x
1,2,n,8,y
2,6,o,9,z


In [14]:
horizontal_com["df_1"]

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [15]:
horizontal_com["df_2"]

Unnamed: 0,A,B
0,45,x
1,8,y
2,9,z


In [16]:
horizontal_com["df_1"][["A","B"]]

Unnamed: 0,A,B
0,4,m
1,2,n
2,6,o


In [17]:
accra =pd.DataFrame({'Town':['Circle','Madina','East Legon'],
                    'Humidity':[32,29,34],
                    'Temperature':[28,30,25]})
kumasi =pd.DataFrame({'Town':['Kejetia','Amakom','Mantia'],
                    'Humidity':[27,39,36],
                    'Temperature':[30,26,24]})


In [18]:
df_3 = pd.concat([accra,kumasi], ignore_index=True)
df_3

Unnamed: 0,Town,Humidity,Temperature
0,Circle,32,28
1,Madina,29,30
2,East Legon,34,25
3,Kejetia,27,30
4,Amakom,39,26
5,Mantia,36,24


In [19]:
df_4 = pd.DataFrame({'Windspeed':[8,5,7,8,6,4,9,10,5]})
df_4

Unnamed: 0,Windspeed
0,8
1,5
2,7
3,8
4,6
5,4
6,9
7,10
8,5


In [20]:
pd.concat([df_3,df_4], ignore_index=True, axis=0)

Unnamed: 0,Town,Humidity,Temperature,Windspeed
0,Circle,32.0,28.0,
1,Madina,29.0,30.0,
2,East Legon,34.0,25.0,
3,Kejetia,27.0,30.0,
4,Amakom,39.0,26.0,
5,Mantia,36.0,24.0,
6,,,,8.0
7,,,,5.0
8,,,,7.0
9,,,,8.0


In [21]:
pd.concat([df_3,df_4], ignore_index=True, axis=1)

Unnamed: 0,0,1,2,3
0,Circle,32.0,28.0,8
1,Madina,29.0,30.0,5
2,East Legon,34.0,25.0,7
3,Kejetia,27.0,30.0,8
4,Amakom,39.0,26.0,6
5,Mantia,36.0,24.0,4
6,,,,9
7,,,,10
8,,,,5


### Merge


### The `merge` function is used to combine two or more dataframes based on a common column(s)

In [22]:
df_5 = pd.DataFrame({'Names': ['Ama', 'Barry', 'Celestine', 'Dela'], 'Score1': [1, 2, 3, 4]})
df_6 = pd.DataFrame({'Names': ['Barry', 'Dela', 'Emma', 'Frank'], 'Score2': [5, 6, 7, 8]})

In [23]:
df_5

Unnamed: 0,Names,Score1
0,Ama,1
1,Barry,2
2,Celestine,3
3,Dela,4


Unnamed: 0,Names,Score1,Score2
0,Barry,2,5
1,Dela,4,6


In [25]:
pd.merge(df_5, df_6, on='Names', how='outer')

Unnamed: 0,Names,Score1,Score2
0,Ama,1.0,
1,Barry,2.0,5.0
2,Celestine,3.0,
3,Dela,4.0,6.0
4,Emma,,7.0
5,Frank,,8.0


In [26]:
pd.merge(df_5, df_6, on='Names', how='left')

Unnamed: 0,Names,Score1,Score2
0,Ama,1,
1,Barry,2,5.0
2,Celestine,3,
3,Dela,4,6.0


In [27]:
pd.merge(df_5, df_6, on='Names', how='right')

Unnamed: 0,Names,Score1,Score2
0,Barry,2.0,5
1,Dela,4.0,6
2,Emma,,7
3,Frank,,8


In [28]:
# The two dataframeshaving differnt column names
df_7 = pd.DataFrame({'Name1': ['Ama', 'Barry', 'Celestine', 'Dela'], 'Score': [1, 2, 3, 4]})
df_8 = pd.DataFrame({'Name2': ['Barry', 'Dela', 'Emma', 'Frank'], 'Score': [5, 6, 7, 8]})
merged_df = pd.merge(df_7, df_8, left_on='Name1', right_on='Name2', how='inner', suffixes=('_df7', '_df8'))
merged_df

Unnamed: 0,Name1,Score_df7,Name2,Score_df8
0,Barry,2,Barry,5
1,Dela,4,Dela,6


In [29]:
df_7.merge(df_8,left_on='Name1', right_on='Name2', how='inner', suffixes=('_df7', '_df8'))

Unnamed: 0,Name1,Score_df7,Name2,Score_df8
0,Barry,2,Barry,5
1,Dela,4,Dela,6


###  Join

### The `join()` function id used to combine two dataframes on their indexes

In [30]:
df_9 = pd.DataFrame({'value1': [1, 2, 3, 4]}, index=['A', 'B', 'C', 'D'])
df_10 = pd.DataFrame({'value2': [5, 6, 7, 8]}, index=['B', 'D', 'E', 'F'])


In [31]:
df_9

Unnamed: 0,value1
A,1
B,2
C,3
D,4


In [32]:
df_9.join(df_10, how="inner")

Unnamed: 0,value1,value2
B,2,5
D,4,6


In [33]:
joined =df_9.join(df_10, how='left', lsuffix='_left', rsuffix='_right')
joined

Unnamed: 0,value1,value2
A,1,
B,2,5.0
C,3,
D,4,6.0


## Filtering: You can filter data based on conditions.

In [34]:


# Filtering rows based on condition
filtered_df = df[df['Age'] > 30]
print(filtered_df)


  Name  Age    City
2  Bob   35  London


## Adding and Removing Columns: 
#### You can add new columns or remove existing ones.


In [35]:
#Code Example:
# Creating a DataFrame from a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 35],
        'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)
print(df)


    Name  Age      City
0   John   30  New York
1  Alice   25     Paris
2    Bob   35    London


In [36]:

# Adding a new column
df['Gender'] = ['Male', 'Female', 'Male']

# Removing a column
df.drop('City', axis=1, inplace=True)

print(df)

    Name  Age  Gender
0   John   30    Male
1  Alice   25  Female
2    Bob   35    Male


## Grouping and Aggregation: 
#### You can group data based on one or more columns and perform aggregation functions.


In [37]:

# Creating a DataFrame from a dictionary
data = {'Num': [10, 20, 30],
        'Age': [30, 25, 35],
        'val': [50, 60, 70]}
df = pd.DataFrame(data)
print(df)


   Num  Age  val
0   10   30   50
1   20   25   60
2   30   35   70


In [38]:
# Grouping and aggregating
grouped_df = df.groupby('Age').mean()
grouped_df

Unnamed: 0_level_0,Num,val
Age,Unnamed: 1_level_1,Unnamed: 2_level_1
25,20.0,60.0
30,10.0,50.0
35,30.0,70.0


### Handling Missing Data: 
#### Pandas provides methods for handling missing data, such as dropping or filling missing values.

##3 Example:


#Dropping rows with missing values

#### cleaned_df = df.dropna()


#Filling missing values with a specified value

#### filled_df = df.fillna(0)


In [39]:

# Creating a DataFrame from a dictionary
data = {'A': [1, 2, None,4],
        'B': [None, 5, 6,7]}
df = pd.DataFrame(data)
#check for missing values
print(df.isnull())


       A      B
0  False   True
1  False  False
2   True  False
3  False  False


### Reading Files with Pandas:

### Import Pandas:

#### Import the Pandas library in your Python script or Jupyter Notebook.
#### import pandas as pd

### Reading CSV Files:

#### Use the read_csv() function to read data from a CSV (Comma-Separated Values) file.

# Syntax
#### df = pd.read_csv('filename.csv')

### Options:
#### sep: Specifies the delimiter (default is ,).
#### header: Specifies the row number to use as column names.
#### index_col: Specifies the column to use as the row index.
#### usecols: Specifies a subset of columns to read.




In [40]:
import pandas as pd

# Reading CSV file with a different delimiter
df1 = pd.read_csv('sample_data.csv', sep=',')
print(df1.head())  # Displaying the first few rows of the DataFrame



    Name  Age           City
0   John   30       New York
1  Alice   25    Los Angeles
2    Bob   35        Chicago
3  Emily   28        Houston
4  David   32  San Francisco


In [41]:
import pandas as pd

# File path of the Excel file
excel_file_path = "sample_data.xlsx"

# Read the Excel file using pandas
df = pd.read_excel(excel_file_path,
                   header=0,  # Use the first row as column names
                   index_col=None,  # Do not use any column as index
                   usecols=None  # Read all columns
                   )

# Display the DataFrame
print(df)


         Name  Age            City
0        John   30        New York
1       Alice   25     Los Angeles
2         Bob   35         Chicago
3       Emily   28         Houston
4       David   32   San Francisco
5     Michael   27           Miami
6       Sarah   40         Seattle
7      Daniel   33          Boston
8        Emma   29          Dallas
9       James   45         Phoenix
10     Olivia   31    Philadelphia
11    William   22       San Diego
12     Sophia   37          Austin
13  Alexander   26          Denver
14        Ava   39        Portland
15      Ethan   34         Atlanta
16        Mia   24       Charlotte
17    Michael   41         Detroit
18   Isabella   28     San Antonio
19   Benjamin   36       Las Vegas
20  Charlotte   32       Nashville
21    Matthew   29     Minneapolis
22     Amelia   38         Orlando
23      Lucas   23  Salt Lake City
24     Harper   42     Kansas City
25      Henry   30    Indianapolis
26     Evelyn   35         Raleigh
27    Jackson   27  