# Starting with Pandas

Pandas is a popular Python library used for data analysis and manipulation. Here's a brief introduction to help you get started with Pandas:

1. **Installation:** You can install pandas using pip, the package installer for Python. Open your command prompt or terminal and type the following command:**pip install pandas**
2. **Importing:** Once you have installed pandas, you can import it in your Python code using the following command: **import pandas as pd**
3. **Data Structures:** Pandas provides two main data structures - Series and DataFrame.
4. **Reading Data:** Pandas allows you to read data from various file formats such as CSV, Excel, SQL, and more. You can use functions like **pd.read_csv()**, **pd.read_excel()**, **pd.read_sql()**, etc. to read data into a DataFrame.
5. **Data Manipulation:** Pandas provides a wide range of functions to manipulate data. You can use functions like filtering, sorting, grouping, merging, etc. to perform various operations on your data.
6. **Data Visualization:** Pandas also provides functions to visualize data using graphs, charts, and plots. You can use functions like **pd.plot()**, **pd.hist()**, **pd.scatter()**, etc. to visualize your data.

# Importing the Pandas library

In [2]:
import pandas as pd

# Series
A Series is a one-dimensional labeled array that can hold any data type such as integers, floats, strings, etc. It is a fundamental Pandas data structure that is used extensively in data analysis and manipulation.

In [3]:
# create a Series using a list
series_data = ["Hasnain", "Ahsan", "Hafiz Hassan", "Abdullah", "Shahvez"]

my_series = pd.Series(series_data)
print(my_series)

0         Hasnain
1           Ahsan
2    Hafiz Hassan
3        Abdullah
4         Shahvez
dtype: object


We printed the **my_series** object, and it shows the index labels (0, 1, 2, 3, 4) and the corresponding values ("Hasnain", "Ahsan", "Hafiz Hassan", "Abdullah", "Shahvez") of the Series.

In [5]:
#Changing the index
index_series = pd.Series(series_data, index= ['a','b','c','d','e'])
index_series

a         Hasnain
b           Ahsan
c    Hafiz Hassan
d        Abdullah
e         Shahvez
dtype: object

In [6]:
# access values and index labels of a Series
print(my_series.values)
print(my_series.index)

['Hasnain' 'Ahsan' 'Hafiz Hassan' 'Abdullah' 'Shahvez']
RangeIndex(start=0, stop=5, step=1)


In [7]:
#Creating a dictionary
my_dict = {'apple': 10, 'banana': 20, 'orange': 30}

my_series = pd.Series(my_dict)
print(my_series)

apple     10
banana    20
orange    30
dtype: int64


In [10]:
new_series = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

#Filter the Series to include only values greater than 20
filtered_series = new_series[new_series > 20]
print(filtered_series)

c    30
d    40
e    50
dtype: int64


# Mathematical operations on Series

In [13]:
#Creating two Series
series1 = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
series2 = pd.Series([5, 10, 15, 20, 25], index=['a', 'b', 'c', 'd', 'e'])

# perform mathematical operations on the Series
sum_series = series1 + series2
diff_series = series1 - series2
product_series = series1 * series2

print(sum_series)
print(diff_series)
print(product_series)

a    15
b    30
c    45
d    60
e    75
dtype: int64
a     5
b    10
c    15
d    20
e    25
dtype: int64
a      50
b     200
c     450
d     800
e    1250
dtype: int64


# DataFrame
A DataFrame is a two-dimensional labeled data structure that is used extensively in data analysis and manipulation. It is a table-like structure that contains rows and columns, where each column can have a different data type such as integers, floats, strings, etc.

In [14]:
#Creating a DataFrame from a dictionary
my_dict = {'Name': ['Hassan', 'Saad', 'Shaheer', 'Faraz'],
           'Age': [23, 25, 21, 22],
           'City': ['New York', 'London', 'Paris', 'Tokyo']}

df = pd.DataFrame(my_dict)
print(df)

      Name  Age      City
0   Hassan   23  New York
1     Saad   25    London
2  Shaheer   21     Paris
3    Faraz   22     Tokyo


In [29]:
#Creating a DataFrame from a list of dictionaries
my_list = [{'Name': 'Hafiz Hassan Mustafa', 'Age': 23, 'City': 'Karachi'}, {'Name': 'Saad', 'Age': 25, 'City': 'London'},
           {'Name': 'Shaheer', 'Age': 21, 'City': 'Paris'}, {'Name': 'Faraz', 'Age': 22, 'City': 'Tokyo'}]

df = pd.DataFrame(my_list)
print(df)

                   Name  Age     City
0  Hafiz Hassan Mustafa   23  Karachi
1                  Saad   25   London
2               Shaheer   21    Paris
3                 Faraz   22    Tokyo


In [30]:
# access columns and rows of a DataFrame
print(df.columns)
print(df.index)
print(df.iloc[0:3])

Index(['Name', 'Age', 'City'], dtype='object')
RangeIndex(start=0, stop=4, step=1)
                   Name  Age     City
0  Hafiz Hassan Mustafa   23  Karachi
1                  Saad   25   London
2               Shaheer   21    Paris


In [31]:
#Filter rows based on a condition
filtered_df = df[df['Age'] > 22]
print(filtered_df)

                   Name  Age     City
0  Hafiz Hassan Mustafa   23  Karachi
1                  Saad   25   London


In [32]:
#Adding a new column to the DataFrame
salary = [50000, 60000, 55000, 70000]

df['Salary'] = salary
print(df)

                   Name  Age     City  Salary
0  Hafiz Hassan Mustafa   23  Karachi   50000
1                  Saad   25   London   60000
2               Shaheer   21    Paris   55000
3                 Faraz   22    Tokyo   70000


In [33]:
#Sorting the DataFrame by a column
sorted_df = df.sort_values('Age')
print(sorted_df)

                   Name  Age     City  Salary
2               Shaheer   21    Paris   55000
3                 Faraz   22    Tokyo   70000
0  Hafiz Hassan Mustafa   23  Karachi   50000
1                  Saad   25   London   60000


In [35]:
import warnings
warnings.filterwarnings('ignore')

#Grouping the DataFrame by a column and calculate the mean
grouped_df = df.groupby('City').mean()
print(grouped_df)

          Age   Salary
City                  
Karachi  23.0  50000.0
London   25.0  60000.0
Paris    21.0  55000.0
Tokyo    22.0  70000.0
