# Pandas Basics

## Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.


In [1]:
import pandas as pd

In [2]:
data = pd.Series([4,8,15,16,23,42])

In [3]:
print(data)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


In [4]:
type(data)

pandas.core.series.Series

## Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [5]:
import numpy as np

data = np.random.randint(20,50,size=10)

In [6]:
data

array([24, 38, 46, 46, 33, 37, 26, 25, 24, 41])

In [7]:
s = pd.Series(data)

In [8]:
s

0    24
1    38
2    46
3    46
4    33
5    37
6    26
7    25
8    24
9    41
dtype: int32

In [9]:
type(s)

pandas.core.series.Series

In [10]:
type(data)

numpy.ndarray

## Q3. Create a Pandas DataFrame that contains the following data:

Name Alice Bob Claire

Age 25 30 27

Gender Female Male Female

In [11]:
data = { 'Name':['Alice','Bob','Claire'], 'Age': [25,30,27], 'Gender' : ['Female','Male','Female']}


In [12]:
data

{'Name': ['Alice', 'Bob', 'Claire'],
 'Age': [25, 30, 27],
 'Gender': ['Female', 'Male', 'Female']}

In [13]:
df = pd.DataFrame(data)

In [14]:
df

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


In [15]:
type(df)

pandas.core.frame.DataFrame

## Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In Pandas, a DataFrame is a two-dimensional, tabular data structure that can store data of different types (numeric, string, boolean, etc.) in rows and columns. It is one of the most commonly used data structures in Pandas and is similar to a spreadsheet or a SQL table. A DataFrame is designed to handle structured data and offers a wide range of operations and functionalities for data manipulation, analysis, and cleaning.

On the other hand, a Pandas Series is a one-dimensional array-like data structure that can hold a single column of data with an associated index. Think of it as a labeled array. It is more similar to a list or a NumPy array, but with the added feature of an index, which allows for more powerful and efficient data alignment and labeling.

In [16]:
data = { 'Name':['Alice','Bob','Claire'], 'Age': [25,30,27], 'Gender' : ['Female','Male','Female']}

In [17]:
data

{'Name': ['Alice', 'Bob', 'Claire'],
 'Age': [25, 30, 27],
 'Gender': ['Female', 'Male', 'Female']}

In [18]:
df = pd.DataFrame(data)

In [19]:
df

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


In [20]:
type(df)

pandas.core.frame.DataFrame

In [21]:
df['Age']

0    25
1    30
2    27
Name: Age, dtype: int64

In [22]:
df['Gender']

0    Female
1      Male
2    Female
Name: Gender, dtype: object

In [23]:
df['Name']

0     Alice
1       Bob
2    Claire
Name: Name, dtype: object

In [24]:
type(df[['Name','Gender']])

pandas.core.frame.DataFrame

## Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

read_csv(): Reads a CSV file into a DataFrame.

head() and tail(): Returns the first or last n rows of the DataFrame.

info(): Provides a concise summary of the DataFrame.

describe(): Generates descriptive statistics of the DataFrame.

dropna(): Removes missing values.

fillna(): Fills missing values.

drop(): Drops specified labels from rows or columns.

groupby(): Groups DataFrame using a mapper or by a Series of columns.

merge(): Merges DataFrame or named Series objects with a database-style join.

pivot_table(): Creates a spreadsheet-style pivot table as a DataFrame.

apply(): Applies a function along an axis of the DataFrame.

sort_values(): Sorts by the values along either axis.

iloc[] and loc[]: Purely integer-location based indexing for selection by position or label.

set_index(): Sets the DataFrame index using existing columns.

reset_index(): Resets the index of the DataFrame, and uses the default one.

value_counts(): Returns a Series containing counts of unique values.

In [25]:
import pandas as pd

data = {
    'Product': ['A', 'B', 'A', 'C', 'B', 'A'],
    'Category': ['Electronics', 'Furniture', 'Electronics', 'Kitchen', 'Furniture', 'Electronics'],
    'Sales': [200, 150, 300, 250, 100, 400]
}

df = pd.DataFrame(data)

# Group by 'Category' and calculate the sum of 'Sales' for each category
category_sales = df.groupby('Category')['Sales'].sum()

print(category_sales)


Category
Electronics    900
Furniture      250
Kitchen        250
Name: Sales, dtype: int64


## Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

Both Series and DataFrame are mutable in nature, while Panel is not mutable.

Series: A Pandas Series is mutable, meaning you can change the elements in the series after it has been created. You can modify, add, or delete elements in a Series.

DataFrame: A Pandas DataFrame is also mutable. You can change the values of individual cells, add or remove columns, and perform various data manipulations on the DataFrame.

Panel: A Panel is a three-dimensional data structure in Pandas, which could be thought of as a dictionary of DataFrames. Unlike Series and DataFrame, a Panel is not mutable, meaning you cannot change its elements once it is created.

In [26]:
data = {"Name" : ["Alice","Bob","Claire"],"Age" : [25,30,27],"Gender" : ["Female","Male","Female"]}

In [27]:

df = pd.DataFrame(data)

In [28]:
df

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


In [29]:
series_age = df['Age']
print(series_age)
print("\n")
print(type(series_age))

0    25
1    30
2    27
Name: Age, dtype: int64


<class 'pandas.core.series.Series'>


Example of Mutability in Series and DataFrame

Series:

In [31]:
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3, 4, 5])

# Modify the first element
s[0] = 10

print(s)


0    10
1     2
2     3
3     4
4     5
dtype: int64


DataFrame:

In [32]:
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Modify an element
df.loc[0, 'A'] = 10

print(df)


    A  B
0  10  4
1   2  5
2   3  6


## Q7. Create a DataFrame using multiple Series. Explain with an example.

In [3]:
import pandas as pd
names = pd.Series(["ram","sham","Pratham","rohit"])
age = pd.Series([22,21,22,20])
gender = pd.Series(['M','M','M','F'])

In [4]:
names

0        ram
1       sham
2    Pratham
3      rohit
dtype: object

In [5]:
age

0    22
1    21
2    22
3    20
dtype: int64

In [6]:
gender

0    M
1    M
2    M
3    F
dtype: object

In [7]:
df = pd.DataFrame({"Names":names,"Age":age,"Gender":gender})

In [8]:
df

Unnamed: 0,Names,Age,Gender
0,ram,22,M
1,sham,21,M
2,Pratham,22,M
3,rohit,20,F
