A DataFrame in pandas is a two-dimensional, labeled data structure with columns of potentially different types. It is similar to a table in a database or an Excel spreadsheet, allowing for data manipulation and analysis.

In [41]:
# pandas dataframe
import pandas as pd
data={
    'student' : ['anit','rohan','rahul'],
    'age' : [19,20,18],
    'rank' : [1,2,3]
}
df=pd.DataFrame(data)
print(df)


  student  age  rank
0    anit   19     1
1   rohan   20     2
2   rahul   18     3


In [11]:
df.loc[1,['student','age']]
df.loc[1:2]
#Use loc for label-based indexing.
#use labels for access rows and column

Unnamed: 0,student,age,rank
1,rohan,20,2
2,rahul,18,3


In [10]:
df.iloc[1:2]
#Use iloc for integer position-based indexing.
#use integer position for access rows and column

Unnamed: 0,student,age,rank
1,rohan,20,2


In [42]:
#for giving my own index name.
df=pd.DataFrame(data,index=['d1','d2','d3'])
print(df)

   student  age  rank
d1    anit   19     1
d2   rohan   20     2
d3   rahul   18     3


In [13]:
for i in df:
    print(i)

student
age
rank


In [14]:
print(df.dtypes) #get data types of the columns

student    object
age         int64
rank        int64
dtype: object


In [15]:
print(df.ndim) #to get dimensions of the data frame

2


In [16]:
print(df.size) #size of the dataframe

9


In [17]:
print(df.shape) #shape of the dataframe

(3, 3)


In [18]:
print(df.index) #getting the index of the dataframe

Index(['d1', 'd2', 'd3'], dtype='object')


In [21]:
print(df['rank'].idxmin()) #get the idx of data where rank is minimum

d1


In [29]:
df.loc['d1','age']=20 #change the age of d1 idx
print(df)

   student  age  rank  (d1, age)
d1    anit   20     1         20
d2   rohan   20     2         20
d3   rahul   18     3         20


In [32]:
res=df[df['age']==20]  #get all data from data frame where age=20
print(res)

   student  age  rank  (d1, age)
d1    anit   20     1         20
d2   rohan   20     2         20


In [33]:
print(df.T) #transpose of the dataframe

             d1     d2     d3
student    anit  rohan  rahul
age          20     20     18
rank          1      2      3
(d1, age)    20     20     20


In [46]:

df.head(1) #return first row by default it returns first 5 rows


Unnamed: 0,student,age,rank
d1,anit,19,1


In [48]:
df.tail(1) #return last row by default it returns last 5 rows

Unnamed: 0,student,age,rank
d3,rahul,18,3


In [54]:
data2={
    'student':['anit','rohan','xyz'],
     'marks':[98,48,20]
}
df2=pd.DataFrame(data2)
print(df2)

  student  marks
0    anit     98
1   rohan     48
2     xyz     20


In [56]:
result=pd.merge(df,df2,on='student',how='inner')
print(result)

  student  age  rank  marks
0    anit   19     1     98
1   rohan   20     2     48


In [None]:
ans=df.join(df2)
# it can't find the specified index or column ('name' in this case) in one or both of the DataFrames.

merge:

More flexible for specifying join columns (on) and types (how).
Handles merging of DataFrames with non-index columns more explicitly.
Allows more control over handling of overlapping column names (suffixes).

join:
Simplifies joining DataFrames primarily on indices.
Automatically appends suffixes to overlapping column names.
Useful for straightforward index-based joins or single column joins.

In [62]:
res=pd.concat([df,df2])
print(res)

   student   age  rank  marks
d1    anit  19.0   1.0    NaN
d2   rohan  20.0   2.0    NaN
d3   rahul  18.0   3.0    NaN
0     anit   NaN   NaN   98.0
1    rohan   NaN   NaN   48.0
2      xyz   NaN   NaN   20.0


DataFrame:
Structure: 2-dimensional data structure resembling a table with rows and columns.
Usage: Ideal for storing and manipulating structured, tabular data.
Creation: Can be created from dictionaries, files (CSV, Excel), or other data structures.
Operations: Supports operations across both rows and columns simultaneously.
Indexing: Has both row and column indices for accessing data.

Series:
Structure: 1-dimensional labeled array capable of holding data of any type.
Usage: Typically used for representing a single column or row of data from a DataFrame.
Creation: Can be created from lists, arrays, or extracted from a DataFrame column.
Operations: Supports vectorized operations on its elements.
Indexing: Uses a single index for accessing elements.