#DataFrame

Pandas DataFrame is a powerful data structure used in data science and data analysis. it is two dimensional heterogeneous tabular data structure(rows and columns) where each element is associated with labeled index.

**key features**
*   *labeled axis* : Every column and rows in dataframe are associated with unique name and index.
*   *Heterogeneous data*: dataframe can hold different types of data like integer, float, string.
*   *Size mutable*: you can add or remove data in dataframe as you needed.
*   *Data alignment*: same as series dataframes alos automatically aligns labels based on operations.

**Creating a DataFrame**

There are several ways to create a dataframe like by (lists, dictionary, numpy etc..)




In [2]:
import pandas as pd

In [None]:
#creating dataFrame from dictionary
dict_data={
    'name':['Ravi','Sai','Teja','Charan'],
    'roll':[101,102,103,104],
    'branch':['CSD','CSM','AI','MECH']
}
dataframe=pd.DataFrame(dict_data)
print(dataframe)

     name  roll branch
0    Ravi   101    CSD
1     Sai   102    CSM
2    Teja   103     AI
3  Charan   104   MECH


In [None]:
#DataFrame from list of dictionaries
list_dict=[
    {'name':'Ravi','roll':1,'branch':'csd'},
    {'name':'sai','roll':2,'branch':'cse'},
    {'name':'Teja','roll':3,'branch':'csm'}
]
dataframe2=pd.DataFrame(list_dict)
print(dataframe2)

   name  roll branch
0  Ravi     1    csd
1   sai     2    cse
2  Teja     3    csm


In [None]:
#dataframe from list of lists
list_data=[
    ['ravi','csm',1],
    ['sai','cse',2],
    ['teja','csd',3]
]
dataframe3=pd.DataFrame(list_data,columns=['name','branch','roll'])
print(dataframe3)

   name branch  roll
0  ravi    csm     1
1   sai    cse     2
2  teja    csd     3


In [None]:
#dataframe from numpy
import numpy as np
numpy_data=np.array([['ravi',1,'mech'],['teja',2,'csm'],['sai',3,'csd']])
dataframe4=pd.DataFrame(numpy_data,columns=['name','roll','branch'])
print(dataframe4)

   name roll branch
0  ravi    1   mech
1  teja    2    csm
2   sai    3    csd


**Accessing Data in DataFrame**

You can access data in dataframe in several ways..


*   *Selecting column*: Returns column
*   *Selecting rows* : Returns row




In [None]:
print(dataframe['name'])#access by column
print(dataframe.loc[0])#access by rows
print(dataframe[['name','roll']])#multiple columns

0      Ravi
1       Sai
2      Teja
3    Charan
Name: name, dtype: object
name      Ravi
roll       101
branch     CSD
Name: 0, dtype: object
     name  roll
0    Ravi   101
1     Sai   102
2    Teja   103
3  Charan   104


In [None]:
#slicing
print(dataframe[0:2])

   name  roll branch
0  Ravi   101    CSD
1   Sai   102    CSM


**DataFrame operations**

In [None]:
#adding new column
dataframe['age']=[19,20,18,21]
print(dataframe)

     name  age
0    Ravi   19
1     Sai   20
2    Teja   18
3  Charan   21


In [None]:
#deleting a column
dataframe.drop('branch', axis=1, inplace=True)

In [None]:
print(dataframe)

     name  age
0    Ravi   19
1     Sai   20
2    Teja   18
3  Charan   21


In [None]:
#filter rows
print(dataframe[dataframe['age']>18])

     name  age
0    Ravi   19
1     Sai   20
3  Charan   21


**Descriptive Statistics**

Descriptive statistics in python can obtain by describe() method. it provides summary about the data like count, mean, starndard deviation etc..

In [None]:
dataframe2.describe()

Unnamed: 0,roll
count,3.0
mean,2.0
std,1.0
min,1.0
25%,1.5
50%,2.0
75%,2.5
max,3.0


In [None]:
dataframe2['roll'].mean()

2.0

**Handling Missing Data**

Pandas supports built in functions to handle with NULL values.

*   isnull(): Returns True if there exits null value.
*   fillna(value): Fills value in NULL item.
*   dropna(): Removes the NULL value.



In [None]:
import numpy as np
data={
    'name':['ravi','teja','sai','charan'],
    'roll':[101,np.nan,103,np.nan]
}
dataframe5=pd.DataFrame(data)

In [None]:
dataframe5.isnull()# checking wheather there is any null value or not

Unnamed: 0,name,roll
0,False,False
1,False,True
2,False,False
3,False,True


In [None]:
dataframe5.fillna(dataframe5['roll'].mean())# filling null value with mean

Unnamed: 0,name,roll
0,ravi,101.0
1,teja,102.0
2,sai,103.0
3,charan,102.0


In [None]:
dataframe5.dropna()#deleting null value

Unnamed: 0,name,roll
0,ravi,101.0
2,sai,103.0


In [None]:
#pivote tables
dataframe2.pivot_table(values='roll',index='name')

Unnamed: 0_level_0,roll
name,Unnamed: 1_level_1
Ravi,1.0
Teja,3.0
sai,2.0


**Data Handling in Pandas**

Pandas is a powerful library in Python for data manipulation and analysis. One of its core functionalities is handling data from various sources and writing processed data back to file

1.Reading data from various file formats.

2.Writing data to different file formats.

3.Handling common data issues like missing data and duplicates.

In [12]:
data=pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/BitcoinPricePrediction.csv')
print(data.head(5))

         Date  opening_price  highest_price  lowest_price  closing_price  \
0  2013-04-01           93.0          106.0          92.2          104.0   
1  2013-04-02          104.0          118.4          99.0          118.0   
2  2013-04-03          118.0          147.0         110.0          135.0   
3  2013-04-04          135.0          142.1         116.4          132.1   
4  2013-04-05          132.1          144.9         130.2          142.3   

   transactions_in_blockchain  avg_block_size  sent_by_adress  \
0                       52572          139256           48809   
1                       63095          175443           62276   
2                       63766          184209           69174   
3                       66738          221568           71753   
4                       61215          190067           69310   

   avg_mining_difficulty  avg_hashrate  ...  avg_transaction_value  \
0                6695826  6.550211e+13  ...                 2592.0   
1           

In [13]:
data.isnull()

Unnamed: 0,Date,opening_price,highest_price,lowest_price,closing_price,transactions_in_blockchain,avg_block_size,sent_by_adress,avg_mining_difficulty,avg_hashrate,...,avg_transaction_value,median_transaction_value,tweets,google_trends,active_addresses,top100_to_total_percentage,avg_fee_to_reward,number_of_coins_in_circulation,miner_revenue,next_day_closing_price
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3088,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3089,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3090,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3091,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


**DataFrame Applications in Data Science**


1.   *Data Cleaning*: DataFrames provide tools for handling missing data, removing duplicates, and transforming data, which are essential steps in data preprocessing.

2.   *Exploratory Data Analysis (EDA)*: DataFrames allow you to easily explore data through descriptive statistics, visualizations, and aggregations, helping to uncover insights and patterns.

3.   *Data Transformation*: You can reshape data using pivot tables, melt operations, and group by operations to prepare it for modeling.


**Conclusion**

The Pandas DataFrame is a versatile and powerful tool in data science. It provides a wide range of functionalities that make data manipulation, analysis, and visualization easier and more efficient. Whether you're cleaning raw data, performing exploratory analysis, or preparing data for machine learning models, DataFrames are at the core of these operations, making them indispensable for any data scientist.