# Head & Tail Functions in Pandas

the head() and tail() functions are used to view a specified number of rows from the beginning or end of a DataFrame. These functions are helpful for quickly inspecting the data to get a sense of its structure and contents.

In [5]:
import pandas as pd

df = pd.read_csv("C:/Users/Vishal/Desktop/Data/CarSalesData.csv")

df

#Head :--> it will return 1st 5 rows of Data Frame

df.head() #By default you will have 5 number of Rows

df.head(2)

Unnamed: 0,ClientID,ClientName,Address1,Address2,Town,County,PostCode,Region,OuterPostode,CountryID,ClientType,ClientSize,ClientSince,IsCreditWorthy,IsDealer
0,1,Aldo Motors,"4, Scale Street",,Uttoxeter,Staffs,ST17 99RZ,East Midlands,ST,1,Wholesaler,Large,04-01-1998 00:00,1,1
1,2,Honest John,99a Baker Street,,London,,NSW1 1A,Greater London Authority,EC,1,Dealer,Large,01-01-2000 00:00,0,0


In [7]:
#tail :--> it will return last 5 rows of Data Frame

df.tail() #By default you will have 5 number of Rows

df.tail(2)

Unnamed: 0,ClientID,ClientName,Address1,Address2,Town,County,PostCode,Region,OuterPostode,CountryID,ClientType,ClientSize,ClientSince,IsCreditWorthy,IsDealer
29,30,British Luxury Automobile Corp,2555 Meridian Blvd,,Franklin,Tennesee,,,TN,3,Dealer,Large,04-01-2013,1,1
30,31,Classy Car Sales,30 Isabella St,,Pittsburgh,Pennsylvania,,,PA,3,Wholesaler,Small,04-01-2013,1,0


# Describe Functions in Pandas

In Pandas, the describe() function is a powerful tool for generating summary statistics of a DataFrame or Series. It provides a quick overview of the central tendency, dispersion, and shape of a dataset's distribution, excluding NaN values

The statistics provided by describe() include:

Count: Number of non-null entries in each column.

Mean: Arithmetic mean (average) of the values.

Std: Standard deviation, which measures the spread or dispersion of the data.

Min: Minimum value in the column.

25%: 25th percentile value (Q1), which is the value below which 25% of the data falls.

50%: 50th percentile value (Q2 or median), which is the middle value of the dataset.

75%: 75th percentile value (Q3), which is the value below which 75% of the data falls.

Max: Maximum value in the column.

In [23]:
student_marks = {
    "ID":[1,2,3,4,5],
    "Name":["Ajay","Vishal","Ravi","Nita","Shital"],
    "Math":[25,26,21,29,30],
    "Chem":[26,26,25,28,30],
    "Bio":[25,26,22,25,28],
    "Phys":[26,28,21,27,30],
}

df1 = pd.DataFrame(student_marks) 
df1

df1.describe()



Unnamed: 0,ID,Math,Chem,Bio,Phys
count,5.0,5.0,5.0,5.0,5.0
mean,3.0,26.2,27.0,25.2,26.4
std,1.581139,3.563706,2.0,2.167948,3.361547
min,1.0,21.0,25.0,22.0,21.0
25%,2.0,25.0,26.0,25.0,26.0
50%,3.0,26.0,26.0,25.0,27.0
75%,4.0,29.0,28.0,26.0,28.0
max,5.0,30.0,30.0,28.0,30.0


# In Pandas the df.info() and df.shape

df.info(): 

This function provides a summary of the DataFrame's structure and content. It displays information such as the number of non-null entries, data types of each column, memory usage, and more.


df.shape: 

This attribute returns a tuple representing the dimensions of the DataFrame, which are the number of rows and columns, respectively

In [15]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ClientID        31 non-null     int64  
 1   ClientName      31 non-null     object 
 2   Address1        18 non-null     object 
 3   Address2        0 non-null      float64
 4   Town            31 non-null     object 
 5   County          12 non-null     object 
 6   PostCode        15 non-null     object 
 7   Region          10 non-null     object 
 8   OuterPostode    20 non-null     object 
 9   CountryID       31 non-null     int64  
 10  ClientType      31 non-null     object 
 11  ClientSize      26 non-null     object 
 12  ClientSince     31 non-null     object 
 13  IsCreditWorthy  31 non-null     int64  
 14  IsDealer        31 non-null     int64  
dtypes: float64(1), int64(4), object(10)
memory usage: 3.8+ KB


In [21]:
rows , columns = df.shape

print("No of Rows in 1st Data Frame : " , rows)
print("No of Columns in 1st Data Frame : " , columns)

r , c = df1.shape

print("No of Rows in 2nd Data Frame : " , r)
print("No of Columns in 2nd Data Frame : " , c)

No of Rows in 1st Data Frame :  31
No of Columns in 1st Data Frame :  15
No of Rows in 2nd Data Frame :  5
No of Columns in 2nd Data Frame :  6


# Featching row & column in pandas

Column Featching 

1.df['col_name']

2.type(df['col_name'])

3.Multiple coln in Data Frame : df[['col1','col2','col3']]


Row Featching

1.df.iloc[index_value]

2.Multiple Row : df.iloc[r1:r5]

3.Specific Row : df.iloc[r1,r2,r3]


In [29]:
#Column Featching

df['Region'] #single Col at a Time

type(df['Region'])

df[['Region','ClientName','County']]

Unnamed: 0,Region,ClientName,County
0,East Midlands,Aldo Motors,Staffs
1,Greater London Authority,Honest John,
2,West Midlands,Bright Orange,
3,North West,Cut'n'Shut,
4,Greater London Authority,Wheels'R'Us,
5,,Les Arnaqueurs,
6,,Crippen & Co,
7,,Rocky Riding,New York
8,,Voitures Diplomatiques S.A.,
9,,Karz,


In [33]:
#Row Featching

df.iloc[1]

df.iloc[5:10]

df.iloc[[1,10,30]]

Unnamed: 0,ClientID,ClientName,Address1,Address2,Town,County,PostCode,Region,OuterPostode,CountryID,ClientType,ClientSize,ClientSince,IsCreditWorthy,IsDealer
1,2,Honest John,99a Baker Street,,London,,NSW1 1A,Greater London Authority,EC,1,Dealer,Large,01-01-2000 00:00,0,0
10,11,Costa Del Speed,,,Madrid,,,,,5,Dealer,Small,31-05-2012 00:00,1,0
30,31,Classy Car Sales,30 Isabella St,,Pittsburgh,Pennsylvania,,,PA,3,Wholesaler,Small,04-01-2013,1,0
