# Basics of pandas dataframe

to install pandas package 
we can write "pip install pandas" in terminal or "!pip install pandas" in jupyter notebook cell

Pandas DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure in Python. It organizes data in rows and columns, allowing efficient manipulation, cleaning, and analysis of data. It is a powerful tool for data processing and integrates well with other libraries for data visualization and analysis

we can import it into your Python script or Jupyter Notebook using the following import statement:

import pandas as pd

After importing, you can create and manipulate DataFrames using the various functions and methods provided by the Pandas library.

To create a Pandas DataFrame, We can use the pd.DataFrame() constructor. We can pass data in various forms, such as a dictionary, a list of dictionaries, a list of lists, or even another DataFrame. Here are a few examples:

In [1]:
import pandas as pd

data1 = {
    'day': ['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04', '2023-08-05', '2023-08-06'],
    'temp': [25.5, 24.8, 26.1, 23.6, 27.3, 28.7],
    'humidity': [60.0, 55.5, 58.9, 62.2, 50.1, 45.8],
    'windspeed': [12.3, 15.2, 10.5, 13.8, 14.6, 11.9]
}

In [2]:
df1 = pd.DataFrame(data1)
# df['day'] = pd.to_datetime(df['day'])  # Convert the 'day' column to datetime type

print the dataframe

In [3]:
df1

Unnamed: 0,day,temp,humidity,windspeed
0,2023-08-01,25.5,60.0,12.3
1,2023-08-02,24.8,55.5,15.2
2,2023-08-03,26.1,58.9,10.5
3,2023-08-04,23.6,62.2,13.8
4,2023-08-05,27.3,50.1,14.6
5,2023-08-06,28.7,45.8,11.9


The df.shape attribute in Pandas DataFrame gives you a tuple representing the dimensions of the DataFrame. The tuple contains two elements: the number of rows and the number of columns in the DataFrame.

In [4]:
df1.shape

(6, 4)

This means the DataFrame has 6 rows and 4 columns.

To print number of rows and columns 

In [5]:
rows,columns = df1.shape
print("number of rows:" ,rows)
print("number of columns:" ,columns)

number of rows: 6
number of columns: 4


In [8]:
# Find out datatype of a column

df1['day'].dtypes

dtype('O')

In [10]:
# View the number of columns in df

df1.columns

Index(['day', 'temp', 'humidity', 'windspeed'], dtype='object')

Print selected columns

In [9]:
selected_columns = ['day', 'temp']
selected_df = df1[selected_columns]
selected_df

Unnamed: 0,day,temp
0,2023-08-01,25.5
1,2023-08-02,24.8
2,2023-08-03,26.1
3,2023-08-04,23.6
4,2023-08-05,27.3
5,2023-08-06,28.7


In [11]:
# Slicing Operation to print rows from 3 -> 5

df1[3:6]

Unnamed: 0,day,temp,humidity,windspeed
3,2023-08-04,23.6,62.2,13.8
4,2023-08-05,27.3,50.1,14.6
5,2023-08-06,28.7,45.8,11.9


To drop a few columns from a Pandas DataFrame, you can use the drop() method. The drop() method allows you to remove one or more columns from the DataFrame. Here's an example of how to drop specific columns:

Here, you can see that event column is still located. This is because, df.drop() won't save the changes until you meet atleast one condition from the below 2 conditions.
* use inplace = True in drop() function itself. -> df.drop("event", axis = "columns", inplace=True)
* Perform reassignment operation. -> df =  df.drop("event", axis = "columns")

In [10]:
# Drop specific columns
columns_to_drop = ['humidity', 'windspeed']
df_dropped = df.drop(columns=columns_to_drop)

print(df_dropped)

          day  temp
0  2023-08-01  25.5
1  2023-08-02  24.8
2  2023-08-03  26.1
3  2023-08-04  23.6
4  2023-08-05  27.3
5  2023-08-06  28.7


The df.describe() method in Pandas provides a statistical summary of a DataFrame's numeric data. It calculates various descriptive statistics for each numeric column in the DataFrame, such as count, mean, standard deviation, minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum.

In [11]:
df1.describe()

Unnamed: 0,temp,humidity,windspeed
count,6.0,6.0,6.0
mean,26.0,55.416667,13.05
std,1.813284,6.323106,1.787456
min,23.6,45.8,10.5
25%,24.975,51.45,12.0
50%,25.8,57.2,13.05
75%,27.0,59.725,14.4
max,28.7,62.2,15.2


The df.describe() method gives you a quick overview of the distribution and central tendency of the numeric data in the DataFrame, which can be useful for data exploration and initial analysis.


pd.concat() is a function in Pandas used to concatenate DataFrames along a specified axis. It allows you to join multiple DataFrames together, either vertically (along rows) or horizontally (along columns).

The basic syntax of pd.concat() is:

pd.concat(objs, axis=0, join='outer', ignore_index=False)

objs: A sequence or mapping of DataFrames to be concatenated.
axis: Specifies the axis along which to concatenate. 0 for vertical concatenation (rows), 1 for horizontal concatenation (columns).
join: Specifies how to handle the overlapping index/columns of DataFrames. Options are 'outer', 'inner', 'left', and 'right'.
ignore_index: If set to True, the resulting DataFrame will have a new index, ignoring the original index.

In [14]:
data2= {
    'day': ['2023-08-07', '2023-08-08', '2023-08-09', '2023-08-10', '2023-08-11', '2023-08-12'],
    'temp': [28.3, 26.5, 27.8, 25.2, 29.1, 30.4],
    'humidity': [55.6, 52.1, 59.8, 63.4, 49.9, 44.6],
    'windspeed': [13.5, 16.8, 11.1, 14.9, 15.7, 12.2]
}

In [15]:
df2 = pd.DataFrame(data2)
df2

Unnamed: 0,day,temp,humidity,windspeed
0,2023-08-07,28.3,55.6,13.5
1,2023-08-08,26.5,52.1,16.8
2,2023-08-09,27.8,59.8,11.1
3,2023-08-10,25.2,63.4,14.9
4,2023-08-11,29.1,49.9,15.7
5,2023-08-12,30.4,44.6,12.2


In [17]:
df3=pd.concat([df1,df2])
df3

Unnamed: 0,day,temp,humidity,windspeed
0,2023-08-01,25.5,60.0,12.3
1,2023-08-02,24.8,55.5,15.2
2,2023-08-03,26.1,58.9,10.5
3,2023-08-04,23.6,62.2,13.8
4,2023-08-05,27.3,50.1,14.6
5,2023-08-06,28.7,45.8,11.9
0,2023-08-07,28.3,55.6,13.5
1,2023-08-08,26.5,52.1,16.8
2,2023-08-09,27.8,59.8,11.1
3,2023-08-10,25.2,63.4,14.9


In [20]:
df4=pd.concat([df1,df2],axis=0)
df4

Unnamed: 0,day,temp,humidity,windspeed
0,2023-08-01,25.5,60.0,12.3
1,2023-08-02,24.8,55.5,15.2
2,2023-08-03,26.1,58.9,10.5
3,2023-08-04,23.6,62.2,13.8
4,2023-08-05,27.3,50.1,14.6
5,2023-08-06,28.7,45.8,11.9
0,2023-08-07,28.3,55.6,13.5
1,2023-08-08,26.5,52.1,16.8
2,2023-08-09,27.8,59.8,11.1
3,2023-08-10,25.2,63.4,14.9


The append() function in Pandas DataFrame is used to append rows from another DataFrame to an existing DataFrame. It returns a new DataFrame that includes the original DataFrame's rows and the rows from the DataFrame being appended.

The basic syntax of append() is:


DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)

other: The DataFrame or Series to be appended.
ignore_index: If set to True, the resulting DataFrame will have a new continuous index, ignoring the original indices of both DataFrames.
verify_integrity: If set to True, it checks if the appended DataFrame has any duplicate indices. If duplicates are found, it raises a ValueError.
sort: If set to True, the resulting DataFrame will be sorted based on the column names.

In [18]:
# Append rows from df2 to df1
appended_df = df1.append(df2, ignore_index=True)

print(appended_df)

           day  temp  humidity  windspeed
0   2023-08-01  25.5      60.0       12.3
1   2023-08-02  24.8      55.5       15.2
2   2023-08-03  26.1      58.9       10.5
3   2023-08-04  23.6      62.2       13.8
4   2023-08-05  27.3      50.1       14.6
5   2023-08-06  28.7      45.8       11.9
6   2023-08-07  28.3      55.6       13.5
7   2023-08-08  26.5      52.1       16.8
8   2023-08-09  27.8      59.8       11.1
9   2023-08-10  25.2      63.4       14.9
10  2023-08-11  29.1      49.9       15.7
11  2023-08-12  30.4      44.6       12.2


  appended_df = df1.append(df2, ignore_index=True)


In [12]:
# Making another dictionary

dict2 = {
    'DOB':['5/3/2005', '5/7/2006', '10/7/2005', '7/4/2011', '2/6/2008', '18/9/2010'],
    'rollno':['16', '20', '36', '13', '23', '17'],
    'Name':['virat', 'Vaibhav', 'mehul', 'kl', 'Shreyas', 'raina'],
    'marks':[80,90,78,79,65,99]
}

In [13]:
# Converting dict2 into a dataframe df2

df6 = pd.DataFrame(dict2)
df6

Unnamed: 0,DOB,rollno,Name,marks
0,5/3/2005,16,virat,80
1,5/7/2006,20,Vaibhav,90
2,10/7/2005,36,mehul,78
3,7/4/2011,13,kl,79
4,2/6/2008,23,Shreyas,65
5,18/9/2010,17,raina,99


In [14]:
# Concatenating df1 and df2. Axis=1 means columns.

df_final = pd.concat([df1, df6], axis=1, join='inner')
df_final

Unnamed: 0,day,temp,humidity,windspeed,DOB,rollno,Name,marks
0,2023-08-01,25.5,60.0,12.3,5/3/2005,16,virat,80
1,2023-08-02,24.8,55.5,15.2,5/7/2006,20,Vaibhav,90
2,2023-08-03,26.1,58.9,10.5,10/7/2005,36,mehul,78
3,2023-08-04,23.6,62.2,13.8,7/4/2011,13,kl,79
4,2023-08-05,27.3,50.1,14.6,2/6/2008,23,Shreyas,65
5,2023-08-06,28.7,45.8,11.9,18/9/2010,17,raina,99


In [15]:
# Find dimension of dataframe .i.e number of rows and number of columns

df_final.shape

(6, 8)

In [16]:
# Using append() to append two dataframes

df_final2 = df1.append(df6)
df_final2

  df_final2 = df1.append(df6)


Unnamed: 0,day,temp,humidity,windspeed,DOB,rollno,Name,marks
0,2023-08-01,25.5,60.0,12.3,,,,
1,2023-08-02,24.8,55.5,15.2,,,,
2,2023-08-03,26.1,58.9,10.5,,,,
3,2023-08-04,23.6,62.2,13.8,,,,
4,2023-08-05,27.3,50.1,14.6,,,,
5,2023-08-06,28.7,45.8,11.9,,,,
0,,,,,5/3/2005,16.0,virat,80.0
1,,,,,5/7/2006,20.0,Vaibhav,90.0
2,,,,,10/7/2005,36.0,mehul,78.0
3,,,,,7/4/2011,13.0,kl,79.0


In [17]:
df_final2.shape

(12, 8)

In [18]:
# Concatenating df1 and df6. Axis=0 means row-wise.

df_final3 = pd.concat([df1, df6], axis=0, join='outer')
df_final3

Unnamed: 0,day,temp,humidity,windspeed,DOB,rollno,Name,marks
0,2023-08-01,25.5,60.0,12.3,,,,
1,2023-08-02,24.8,55.5,15.2,,,,
2,2023-08-03,26.1,58.9,10.5,,,,
3,2023-08-04,23.6,62.2,13.8,,,,
4,2023-08-05,27.3,50.1,14.6,,,,
5,2023-08-06,28.7,45.8,11.9,,,,
0,,,,,5/3/2005,16.0,virat,80.0
1,,,,,5/7/2006,20.0,Vaibhav,90.0
2,,,,,10/7/2005,36.0,mehul,78.0
3,,,,,7/4/2011,13.0,kl,79.0


Hence, You must give axis and join method carefully according to you and avoid getting Null Values

In [19]:
df_final3.shape

(12, 8)

In [1]:
import pandas as pd

df1=pd.read_excel(r"C:\Users\raval\Downloads\RESULT1.xlsx")
# print(df1)
df2=pd.read_excel(r"C:\Users\raval\Downloads\RESULT2.xlsx")
# print(df2)

# concat two dataframe
# a = pd.concat([df1,df2],ignore_index=True)
# print(a)
b = pd.concat([df1,df2])

In [2]:
b

Unnamed: 0,SRNO,BRANCH,NAME,TOTAL,PERCENTAGE,PASSFAIL
0,1,CE,RAMESH,210,70,1
1,2,CE,SURESH,150,50,1
2,3,IT,MAHESH,225,75,1
3,4,IT,NARESH,180,60,1
4,5,CE,JAYESH,90,30,0
0,1,EC,RATAN,150,50,1
1,2,CE,JATAN,270,90,1
2,3,IT,KATHAN,285,95,1
3,4,EC,NAYAN,195,65,1
4,5,IT,RAMAN,165,55,1


In [3]:
c=df1.append(df2)

  c=df1.append(df2)


In [4]:
c

Unnamed: 0,SRNO,BRANCH,NAME,TOTAL,PERCENTAGE,PASSFAIL
0,1,CE,RAMESH,210,70,1
1,2,CE,SURESH,150,50,1
2,3,IT,MAHESH,225,75,1
3,4,IT,NARESH,180,60,1
4,5,CE,JAYESH,90,30,0
0,1,EC,RATAN,150,50,1
1,2,CE,JATAN,270,90,1
2,3,IT,KATHAN,285,95,1
3,4,EC,NAYAN,195,65,1
4,5,IT,RAMAN,165,55,1


In [5]:
c.shape

(10, 6)

In [6]:
b.shape

(10, 6)

In [7]:
df1

Unnamed: 0,SRNO,BRANCH,NAME,TOTAL,PERCENTAGE,PASSFAIL
0,1,CE,RAMESH,210,70,1
1,2,CE,SURESH,150,50,1
2,3,IT,MAHESH,225,75,1
3,4,IT,NARESH,180,60,1
4,5,CE,JAYESH,90,30,0


In [8]:
d= pd.concat([df1,df2],axis=0)

In [9]:
d

Unnamed: 0,SRNO,BRANCH,NAME,TOTAL,PERCENTAGE,PASSFAIL
0,1,CE,RAMESH,210,70,1
1,2,CE,SURESH,150,50,1
2,3,IT,MAHESH,225,75,1
3,4,IT,NARESH,180,60,1
4,5,CE,JAYESH,90,30,0
0,1,EC,RATAN,150,50,1
1,2,CE,JATAN,270,90,1
2,3,IT,KATHAN,285,95,1
3,4,EC,NAYAN,195,65,1
4,5,IT,RAMAN,165,55,1


In [11]:
# e=df1.append(df2,axis=0)#error