Overview
This repository contains Python code for analyzing a sample table that includes columns for name, marks, and gender. The purpose of this analysis is to gain insights into the dataset and extract meaningful information.

Dataset Description
The dataset comprises the following columns:

•	Name: The name of the individual.

•	Marks: Marks obtained by the individual.

•	Gender: Gender of the individual.
Analysis Steps
1.	Data Loading: Load the dataset into Python environment.
2.	Data Exploration: Explore the dataset to understand its structure and content.
3.	Data Cleaning: Clean the dataset by handling missing values, outliers, or any inconsistencies.
4.	Descriptive Statistics: Compute descriptive statistics such as mean, median, mode, standard deviation, etc.
5.	Statistical Analysis: Conduct statistical analysis to uncover relationships and insights within the data.
6.	Conclusion: Summarize the findings and insights obtained from the analysis.

Requirements
•	Python 3.x

•	Pandas

•	NumPy


In [1]:
import pandas as pd

In [25]:
dict1={'Name':['Sam','John','Tara','David','Asha','Mark'],
       'Marks':[99,98,95,93,88,90],
       'Gender':['Male','Male','Female','Male','Female','Male']}

df1=pd.DataFrame(dict1)
df1

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female
3,David,93,Male
4,Asha,88,Female
5,Mark,90,Male


**1. display top 3 rows of the dataset**





In [26]:
df1.head(3)

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female


**2. check last 2 rows of dataset**

In [27]:
df1.tail(2)

Unnamed: 0,Name,Marks,Gender
4,Asha,88,Female
5,Mark,90,Male


**3. Find shape of dataset**

In [28]:
df1.shape

(6, 3)

In [8]:
print('Number of rows:',df1.shape[0])
print('number of columns:', df1.shape[1])

Number of rows: 6
number of columns: 3


**4. get information about dataet like total number of rows, columns, datatypes**

In [29]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    6 non-null      object
 1   Marks   6 non-null      int64 
 2   Gender  6 non-null      object
dtypes: int64(1), object(2)
memory usage: 272.0+ bytes


**5. Check null values**

In [30]:
df1.isnull().sum()# columnwise null values

Name      0
Marks     0
Gender    0
dtype: int64

In [31]:
df1.isnull().sum(axis=1) #null value in row

0    0
1    0
2    0
3    0
4    0
5    0
dtype: int64

**6.get overall statistic about dataframe**

In [32]:
df1.describe() #by default descibe statistic for numerical coumn only
# here 25% values are below 90, 50% below 94

Unnamed: 0,Marks
count,6.0
mean,93.833333
std,4.355074
min,88.0
25%,90.75
50%,94.0
75%,97.25
max,99.0


In [33]:
df1.describe(include='all') #statistic for all columns, NaN because we cannot perform mean, std for categorical values

Unnamed: 0,Name,Marks,Gender
count,6,6.0,6
unique,6,,2
top,Sam,,Male
freq,1,,4
mean,,93.833333,
std,,4.355074,
min,,88.0,
25%,,90.75,
50%,,94.0,
75%,,97.25,


**7. find unique values from gender column**

In [34]:
df1

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female
3,David,93,Male
4,Asha,88,Female
5,Mark,90,Male


In [36]:
df1['Gender'].unique()

array(['Male', 'Female'], dtype=object)

**8. Find the number of unique values from gender column**

In [37]:
df1['Gender'].nunique()

2

**9. dispaly count of unique values in gender column**

In [38]:
df1['Gender'].value_counts()

Male      4
Female    2
Name: Gender, dtype: int64

**10. Find total number of student having marks between 90 to 95 (inclusive) using between method**

In [39]:
#without using between method
df1[df1['Marks']>=90]

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female
3,David,93,Male
5,Mark,90,Male


In [42]:
len(df1[(df1['Marks']>=90) & (df1['Marks']<=95)])

3

In [44]:
#between method
sum(df1['Marks'].between(90,95)) #sum will give total true condition

3

**11. find averang marks**

In [45]:
df1

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female
3,David,93,Male
4,Asha,88,Female
5,Mark,90,Male


In [46]:
df1['Marks'].mean()

93.83333333333333

**12. apply method**

In [52]:
#wihout using inbuilt function
def marks(x):
  return x//2

In [53]:
df1['Half_Marks']=df1['Marks'].apply(marks)

In [54]:
df1

Unnamed: 0,Name,Marks,Gender,Half_Marks
0,Sam,99,Male,49
1,John,98,Male,49
2,Tara,95,Female,47
3,David,93,Male,46
4,Asha,88,Female,44
5,Mark,90,Male,45


In [55]:
df1['Marks'].apply(lambda x:x/2)

0    49.5
1    49.0
2    47.5
3    46.5
4    44.0
5    45.0
Name: Marks, dtype: float64

**13. map function**

In [56]:
df1

Unnamed: 0,Name,Marks,Gender,Half_Marks
0,Sam,99,Male,49
1,John,98,Male,49
2,Tara,95,Female,47
3,David,93,Male,46
4,Asha,88,Female,44
5,Mark,90,Male,45


In [59]:
df1['Male_Female']=df1['Gender'].map({'Male':1,'Female':0}) #converting categorical to binary

In [60]:
df1

Unnamed: 0,Name,Marks,Gender,Half_Marks,Male_Female
0,Sam,99,Male,49,1
1,John,98,Male,49,1
2,Tara,95,Female,47,0
3,David,93,Male,46,1
4,Asha,88,Female,44,0
5,Mark,90,Male,45,1


**14.Drop column**

In [61]:
df1.drop('Male_Female', axis=1)

Unnamed: 0,Name,Marks,Gender,Half_Marks
0,Sam,99,Male,49
1,John,98,Male,49
2,Tara,95,Female,47
3,David,93,Male,46
4,Asha,88,Female,44
5,Mark,90,Male,45


In [65]:
df1.drop(['Male_Female','Half_Marks'], axis=1, inplace=True) #inplace=true will modify existing dataframe

In [66]:
df1

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female
3,David,93,Male
4,Asha,88,Female
5,Mark,90,Male


**15. Print name of columns**

In [67]:
df1.columns

Index(['Name', 'Marks', 'Gender'], dtype='object')

In [68]:
df1.index

RangeIndex(start=0, stop=6, step=1)

**16.sort the dataframe as per the marks column**

In [70]:
df1.sort_values(by='Marks')

Unnamed: 0,Name,Marks,Gender
4,Asha,88,Female
5,Mark,90,Male
3,David,93,Male
2,Tara,95,Female
1,John,98,Male
0,Sam,99,Male


In [72]:
#in descending order
df1.sort_values(by='Marks', ascending=False)

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female
3,David,93,Male
5,Mark,90,Male
4,Asha,88,Female


In [74]:
#sort as per marks and gender
df1.sort_values(by=['Marks','Gender'], ascending=False)

Unnamed: 0,Name,Marks,Gender
0,Sam,99,Male
1,John,98,Male
2,Tara,95,Female
3,David,93,Male
5,Mark,90,Male
4,Asha,88,Female


**17.display name and marks of female students**

In [82]:
df1[df1['Gender']=='Female'][['Name','Marks']]

Unnamed: 0,Name,Marks
2,Tara,95
4,Asha,88


In [81]:
df1[df1['Gender'].isin(['Female'])][['Name','Marks']] #using panda function

Unnamed: 0,Name,Marks
2,Tara,95
4,Asha,88


Conclusion

After analyzing the dataset, it was found that the average marks obtained by students is 93.8 with a standard deviation of 4.355. These statistics provide insight into the overall performance of the students in the dataset.
