# -"Unlocking the Magic of Data Manipulation: A Beginner's Guide to Pandas in Python"-

Are you ready to unleash the power of Pandas and transform the way you handle data? Look no further! This notebook is your ticket to mastering the art of data manipulation with Pandas, designed specifically for all types of users in Python.

Data manipulation plays a crucial role in data analysis and is often the first step in extracting valuable insights from your datasets. **Pandas**, with its user-friendly syntax, empowers beginners to dive into data manipulation effortlessly, opening doors to endless possibilities.

In this beginner's guide, we will take you on a journey through the basics of Pandas, equipping you with the essential knowledge to start exploring, cleaning, and transforming your data like a pro. No prior experience required!

Get ready to learn the fundamentals of Pandas syntax, including loading and examining datasets, filtering and sorting data, performing basic calculations, and more. Through step-by-step examples and exercises, we'll ensure you grasp each concept and gain hands-on experience along the way.

But that's not all – we want your opinion! As you progress through the notebook, we encourage you to vote on your favorite Pandas functionalities, exercises, and tips. Your feedback will help us tailor future notebooks and create a more interactive learning experience for beginners like you.

So, are you ready to embark on this data manipulation adventure? Let's dive into the world of Pandas and unlock the magic of transforming data with simplicity and elegance. Get ready to revolutionize your Python skills and become a confident data wrangler with Pandas!

# 1.) Lets import the needed libraries first ;[ Do make sure you have this installed in your local devices by using the following : ]
<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
  import the needed libraries 
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>


```
pip install pandas

```

In [1]:
import pandas as pd
import os 

# 2.) Importing the needed datasets using their paths : 

<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
  import the needed Dataset.
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>



In [2]:
# Assuming a variable "data" and storing the data into it using path . 

data = pd.read_csv("/kaggle/input/countries-economy-gdp-and-everything/countries-of-the-world.csv")
print(data)



             Country                               Region  Population  \
0       Afghanistan         ASIA (EX. NEAR EAST)             31056997   
1           Albania   EASTERN EUROPE                          3581655   
2           Algeria   NORTHERN AFRICA                        32930091   
3    American Samoa   OCEANIA                                   57794   
4           Andorra   WESTERN EUROPE                            71201   
..               ...                                  ...         ...   
222       West Bank   NEAR EAST                               2460492   
223  Western Sahara   NORTHERN AFRICA                          273008   
224           Yemen   NEAR EAST                              21456188   
225          Zambia   SUB-SAHARAN AFRICA                     11502010   
226        Zimbabwe   SUB-SAHARAN AFRICA                     12236805   

     Area (sq. mi.) Pop. Density (per sq. mi.) Coastline (coast/area ratio)  \
0            647500                       48

In [3]:
data2 = pd.read_csv("/kaggle/input/500-albums-between-1955-2011/500 albums (1955-2011).csv")
print(data2)

     Number  Year                              Album           Artist  \
0       101  1955             In the Wee Small Hours    Frank Sinatra   
1        56  1956                      Elvis Presley    Elvis Presley   
2       308  1956         Songs for Swingin' Lovers!    Frank Sinatra   
3        50  1957              Here's Little Richard   Little Richard   
4       420  1957            The "Chirping" Crickets     The Crickets   
..      ...   ...                                ...              ...   
495     430  2007                    Vampire Weekend  Vampire Weekend   
496     494  2007               Oracular Spectacular             MGMT   
497     437  2008                     Tha Carter III        Lil Wayne   
498     353  2010  My Beautiful Dark Twisted Fantasy       Kanye West   
499     381  2011                 The Smile Sessions   The Beach Boys   

                     Genre                             Subgenre  
0                Jazz, Pop                     Big Band, 

In [4]:
data3 = pd.read_csv("/kaggle/input/tabular-playground-series-mar-2022/train.csv")
print(data3)

        row_id                 time  x  y direction  congestion
0            0  1991-04-01 00:00:00  0  0        EB          70
1            1  1991-04-01 00:00:00  0  0        NB          49
2            2  1991-04-01 00:00:00  0  0        SB          24
3            3  1991-04-01 00:00:00  0  1        EB          18
4            4  1991-04-01 00:00:00  0  1        NB          60
...        ...                  ... .. ..       ...         ...
848830  848830  1991-09-30 11:40:00  2  3        NB          54
848831  848831  1991-09-30 11:40:00  2  3        NE          28
848832  848832  1991-09-30 11:40:00  2  3        SB          68
848833  848833  1991-09-30 11:40:00  2  3        SW          17
848834  848834  1991-09-30 11:40:00  2  3        WB          24

[848835 rows x 6 columns]


# 3.) Printing the first 5 records with head() function : 

<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
head() function
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>



In [5]:
data.head()

Unnamed: 0,Country,Region,Population,Area (sq. mi.),Pop. Density (per sq. mi.),Coastline (coast/area ratio),Net migration,Infant mortality (per 1000 births),GDP ($ per capita),Literacy (%),Phones (per 1000),Arable (%),Crops (%),Other (%),Climate,Birthrate,Deathrate,Agriculture,Industry,Service
0,Afghanistan,ASIA (EX. NEAR EAST),31056997,647500,480,0,2306,16307,700.0,360,32,1213,22,8765,1,466,2034,38.0,24.0,38.0
1,Albania,EASTERN EUROPE,3581655,28748,1246,126,-493,2152,4500.0,865,712,2109,442,7449,3,1511,522,232.0,188.0,579.0
2,Algeria,NORTHERN AFRICA,32930091,2381740,138,4,-39,31,6000.0,700,781,322,25,9653,1,1714,461,101.0,6.0,298.0
3,American Samoa,OCEANIA,57794,199,2904,5829,-2071,927,8000.0,970,2595,10,15,75,2,2246,327,,,
4,Andorra,WESTERN EUROPE,71201,468,1521,0,66,405,19000.0,1000,4972,222,0,9778,3,871,625,,,


# 4.) Printing the last 5 records with tail() function : 

<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
tail() function
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>



In [6]:
data.tail()

Unnamed: 0,Country,Region,Population,Area (sq. mi.),Pop. Density (per sq. mi.),Coastline (coast/area ratio),Net migration,Infant mortality (per 1000 births),GDP ($ per capita),Literacy (%),Phones (per 1000),Arable (%),Crops (%),Other (%),Climate,Birthrate,Deathrate,Agriculture,Industry,Service
222,West Bank,NEAR EAST,2460492,5860,4199,0,298.0,1962.0,800.0,,1452.0,169,1897,6413,3,3167.0,392.0,9.0,28.0,63
223,Western Sahara,NORTHERN AFRICA,273008,266000,10,42,,,,,,2,0,9998,1,,,,,4
224,Yemen,NEAR EAST,21456188,527970,406,36,0.0,615.0,800.0,502.0,372.0,278,24,9698,1,4289.0,83.0,135.0,472.0,393
225,Zambia,SUB-SAHARAN AFRICA,11502010,752614,153,0,0.0,8829.0,800.0,806.0,82.0,708,3,929,2,41.0,1993.0,22.0,29.0,489
226,Zimbabwe,SUB-SAHARAN AFRICA,12236805,390580,313,0,0.0,6769.0,1900.0,907.0,268.0,832,34,9134,2,2801.0,2184.0,179.0,243.0,579


# 5.) Knowing more about the data : 



 <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
Printing all the datatype our columns consists of
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>



In [7]:
data.dtypes

Country                                object
Region                                 object
Population                              int64
Area (sq. mi.)                          int64
Pop. Density (per sq. mi.)             object
Coastline (coast/area ratio)           object
Net migration                          object
Infant mortality (per 1000 births)     object
GDP ($ per capita)                    float64
Literacy (%)                           object
Phones (per 1000)                      object
Arable (%)                             object
Crops (%)                              object
Other (%)                              object
Climate                                object
Birthrate                              object
Deathrate                              object
Agriculture                            object
Industry                               object
Service                                object
dtype: object




  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
    As seen above we can conclude we got 500 records and 5 columns or 5 objects datatypes ;

</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>




  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
 print rows and columns having the datatype as "object".

</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [8]:
data.select_dtypes(object)

Unnamed: 0,Country,Region,Pop. Density (per sq. mi.),Coastline (coast/area ratio),Net migration,Infant mortality (per 1000 births),Literacy (%),Phones (per 1000),Arable (%),Crops (%),Other (%),Climate,Birthrate,Deathrate,Agriculture,Industry,Service
0,Afghanistan,ASIA (EX. NEAR EAST),480,000,2306,16307,360,32,1213,022,8765,1,466,2034,038,024,038
1,Albania,EASTERN EUROPE,1246,126,-493,2152,865,712,2109,442,7449,3,1511,522,0232,0188,0579
2,Algeria,NORTHERN AFRICA,138,004,-039,31,700,781,322,025,9653,1,1714,461,0101,06,0298
3,American Samoa,OCEANIA,2904,5829,-2071,927,970,2595,10,15,75,2,2246,327,,,
4,Andorra,WESTERN EUROPE,1521,000,66,405,1000,4972,222,0,9778,3,871,625,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
222,West Bank,NEAR EAST,4199,000,298,1962,,1452,169,1897,6413,3,3167,392,009,028,063
223,Western Sahara,NORTHERN AFRICA,10,042,,,,,002,0,9998,1,,,,,04
224,Yemen,NEAR EAST,406,036,0,615,502,372,278,024,9698,1,4289,83,0135,0472,0393
225,Zambia,SUB-SAHARAN AFRICA,153,000,0,8829,806,82,708,003,929,2,41,1993,022,029,0489





  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
 print rows and columns having the datatype as "int64".

</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [9]:
data.select_dtypes('int64')

Unnamed: 0,Population,Area (sq. mi.)
0,31056997,647500
1,3581655,28748
2,32930091,2381740
3,57794,199
4,71201,468
...,...,...
222,2460492,5860
223,273008,266000
224,21456188,527970
225,11502010,752614


In [10]:
f"Categorical Columns {data.select_dtypes('object').columns.tolist()}"

"Categorical Columns ['Country', 'Region', 'Pop. Density (per sq. mi.)', 'Coastline (coast/area ratio)', 'Net migration', 'Infant mortality (per 1000 births)', 'Literacy (%)', 'Phones (per 1000)', 'Arable (%)', 'Crops (%)', 'Other (%)', 'Climate', 'Birthrate', 'Deathrate', 'Agriculture', 'Industry', 'Service']"





  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
Retriving the headers of all columns having dtypes as int64 with their names : 
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [11]:
f"Numerical Columns {data.select_dtypes('int64').columns.tolist()}"

"Numerical Columns ['Population', 'Area (sq. mi.)']"






  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
Retrieving the head or first five records of datatype as int64

</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [12]:
data.select_dtypes('int64').head()

Unnamed: 0,Population,Area (sq. mi.)
0,31056997,647500
1,3581655,28748
2,32930091,2381740
3,57794,199
4,71201,468









  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
Doing the same for objects:

</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [13]:
data.select_dtypes('object').head()

Unnamed: 0,Country,Region,Pop. Density (per sq. mi.),Coastline (coast/area ratio),Net migration,Infant mortality (per 1000 births),Literacy (%),Phones (per 1000),Arable (%),Crops (%),Other (%),Climate,Birthrate,Deathrate,Agriculture,Industry,Service
0,Afghanistan,ASIA (EX. NEAR EAST),480,0,2306,16307,360,32,1213,22,8765,1,466,2034,38.0,24.0,38.0
1,Albania,EASTERN EUROPE,1246,126,-493,2152,865,712,2109,442,7449,3,1511,522,232.0,188.0,579.0
2,Algeria,NORTHERN AFRICA,138,4,-39,31,700,781,322,25,9653,1,1714,461,101.0,6.0,298.0
3,American Samoa,OCEANIA,2904,5829,-2071,927,970,2595,10,15,75,2,2246,327,,,
4,Andorra,WESTERN EUROPE,1521,0,66,405,1000,4972,222,0,9778,3,871,625,,,


# 6.) Formatting the dataframes: 








  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
 6.) Formatting the dataframes: 


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [14]:
data.select_dtypes('int64').head(50).style.background_gradient(cmap = "RdBu")

Unnamed: 0,Population,Area (sq. mi.)
0,31056997,647500
1,3581655,28748
2,32930091,2381740
3,57794,199
4,71201,468
5,12127071,1246700
6,13477,102
7,69108,443
8,39921833,2766890
9,2976372,29800


In [15]:
data2.select_dtypes('int64').head(50).style.background_gradient(cmap = "BuGn")

Unnamed: 0,Number,Year
0,101,1955
1,56,1956
2,308,1956
3,50,1957
4,420,1957
5,154,1958
6,12,1959
7,103,1959
8,248,1959
9,265,1959


# 7.) Rounding off the data : 


  <p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
 7.) Rounding off the data : 
 


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [16]:
data3.head()

Unnamed: 0,row_id,time,x,y,direction,congestion
0,0,1991-04-01 00:00:00,0,0,EB,70
1,1,1991-04-01 00:00:00,0,0,NB,49
2,2,1991-04-01 00:00:00,0,0,SB,24
3,3,1991-04-01 00:00:00,0,1,EB,18
4,4,1991-04-01 00:00:00,0,1,NB,60



<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
  Lets convert characters into real time.
  : 
 


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [17]:
data3['time'] = pd.to_datetime(data3['time'])



<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
  NOTE :
 


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>
The line of code `data3_dt = data3.set_index('time')` takes the DataFrame called `data3` and rearranges it so that the column called 'time' becomes the special identifier for each row in the new DataFrame called `data3_dt`. This makes it easier to access and work with the data based on the values in the 'time' column.

In [18]:
data3_dt = data3.set_index('time') # The 'data3_dt' variable is being set as a new DataFrame 
# by using the 'set_index' function with the column 'time' as the new index.
print(data3_dt)

                     row_id  x  y direction  congestion
time                                                   
1991-04-01 00:00:00       0  0  0        EB          70
1991-04-01 00:00:00       1  0  0        NB          49
1991-04-01 00:00:00       2  0  0        SB          24
1991-04-01 00:00:00       3  0  1        EB          18
1991-04-01 00:00:00       4  0  1        NB          60
...                     ... .. ..       ...         ...
1991-09-30 11:40:00  848830  2  3        NB          54
1991-09-30 11:40:00  848831  2  3        NE          28
1991-09-30 11:40:00  848832  2  3        SB          68
1991-09-30 11:40:00  848833  2  3        SW          17
1991-09-30 11:40:00  848834  2  3        WB          24

[848835 rows x 5 columns]


The 'data3_dt.index' command gives you the list of labels that are used to identify the rows in the DataFrame called 'data3_dt'. It's like a special column that helps you find and organize the data easily.

In [19]:
data3_dt.index

DatetimeIndex(['1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               ...
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00'],
              dtype='datetime64[ns]', name='time', length=848835, freq=None)

In [20]:
data3_dt.index.round('S')
# Rounding up the Time values ; 

DatetimeIndex(['1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               '1991-04-01 00:00:00', '1991-04-01 00:00:00',
               ...
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00',
               '1991-09-30 11:40:00', '1991-09-30 11:40:00'],
              dtype='datetime64[ns]', name='time', length=848835, freq=None)

# 8.) Start and end for data : 



<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
 Start and end for data


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [21]:
data2.head() # First five records

Unnamed: 0,Number,Year,Album,Artist,Genre,Subgenre
0,101,1955,In the Wee Small Hours,Frank Sinatra,"Jazz, Pop","Big Band, Ballad"
1,56,1956,Elvis Presley,Elvis Presley,Rock,"Rock & Roll, Rockabilly"
2,308,1956,Songs for Swingin' Lovers!,Frank Sinatra,"Jazz, Pop","Vocal, Easy Listening"
3,50,1957,Here's Little Richard,Little Richard,"Rock, Blues","Rock & Roll, Rhythm & Blues"
4,420,1957,"The ""Chirping"" Crickets",The Crickets,"Rock, Pop","Rockabilly, Rock & Roll"


In [22]:
data2.tail() # last 5 records 

Unnamed: 0,Number,Year,Album,Artist,Genre,Subgenre
495,430,2007,Vampire Weekend,Vampire Weekend,Rock,Indie Rock
496,494,2007,Oracular Spectacular,MGMT,"Electronic, Rock, Pop","Synth-pop, Indie Rock"
497,437,2008,Tha Carter III,Lil Wayne,"Hip Hop, Funk / Soul","RnB/Swing, Screw, Pop Rap, Thug Rap"
498,353,2010,My Beautiful Dark Twisted Fantasy,Kanye West,Hip Hop,
499,381,2011,The Smile Sessions,The Beach Boys,Rock,"Pop Rock, Psychedelic Rock"


In [23]:
data2.head(2) # prints first two records . 

Unnamed: 0,Number,Year,Album,Artist,Genre,Subgenre
0,101,1955,In the Wee Small Hours,Frank Sinatra,"Jazz, Pop","Big Band, Ballad"
1,56,1956,Elvis Presley,Elvis Presley,Rock,"Rock & Roll, Rockabilly"


In [24]:
data2.tail(4) # Will give you last 4 records of table

Unnamed: 0,Number,Year,Album,Artist,Genre,Subgenre
496,494,2007,Oracular Spectacular,MGMT,"Electronic, Rock, Pop","Synth-pop, Indie Rock"
497,437,2008,Tha Carter III,Lil Wayne,"Hip Hop, Funk / Soul","RnB/Swing, Screw, Pop Rap, Thug Rap"
498,353,2010,My Beautiful Dark Twisted Fantasy,Kanye West,Hip Hop,
499,381,2011,The Smile Sessions,The Beach Boys,Rock,"Pop Rock, Psychedelic Rock"


# 9.) Exploratory data analysis  :


<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
 EXPORATORY DATA ANALYSIS.


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [25]:
# Columns of the DataFrame
data2.columns


Index(['Number', 'Year', 'Album', 'Artist', 'Genre', 'Subgenre'], dtype='object')


<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
descrpition about index of data
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [26]:
data.index

RangeIndex(start=0, stop=227, step=1)

In [27]:
data2.index

RangeIndex(start=0, stop=500, step=1)

In [28]:
data3.index

RangeIndex(start=0, stop=848835, step=1)




<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
lets retrieve some information of our dataset: 
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [29]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 227 entries, 0 to 226
Data columns (total 20 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   Country                             227 non-null    object 
 1   Region                              227 non-null    object 
 2   Population                          227 non-null    int64  
 3   Area (sq. mi.)                      227 non-null    int64  
 4   Pop. Density (per sq. mi.)          227 non-null    object 
 5   Coastline (coast/area ratio)        227 non-null    object 
 6   Net migration                       224 non-null    object 
 7   Infant mortality (per 1000 births)  224 non-null    object 
 8   GDP ($ per capita)                  226 non-null    float64
 9   Literacy (%)                        209 non-null    object 
 10  Phones (per 1000)                   223 non-null    object 
 11  Arable (%)                          225 non-n







<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
As we can see in the above output not every columns has same output since repeated values are excluded

</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [30]:
data2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Number    500 non-null    int64 
 1   Year      500 non-null    int64 
 2   Album     500 non-null    object
 3   Artist    500 non-null    object
 4   Genre     500 non-null    object
 5   Subgenre  500 non-null    object
dtypes: int64(2), object(4)
memory usage: 23.6+ KB


In [31]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 848835 entries, 0 to 848834
Data columns (total 6 columns):
 #   Column      Non-Null Count   Dtype         
---  ------      --------------   -----         
 0   row_id      848835 non-null  int64         
 1   time        848835 non-null  datetime64[ns]
 2   x           848835 non-null  int64         
 3   y           848835 non-null  int64         
 4   direction   848835 non-null  object        
 5   congestion  848835 non-null  int64         
dtypes: datetime64[ns](1), int64(4), object(1)
memory usage: 38.9+ MB










<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
Describing the dataframe : 

</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>

In [32]:
data2.describe 

<bound method NDFrame.describe of      Number  Year                              Album           Artist  \
0       101  1955             In the Wee Small Hours    Frank Sinatra   
1        56  1956                      Elvis Presley    Elvis Presley   
2       308  1956         Songs for Swingin' Lovers!    Frank Sinatra   
3        50  1957              Here's Little Richard   Little Richard   
4       420  1957            The "Chirping" Crickets     The Crickets   
..      ...   ...                                ...              ...   
495     430  2007                    Vampire Weekend  Vampire Weekend   
496     494  2007               Oracular Spectacular             MGMT   
497     437  2008                     Tha Carter III        Lil Wayne   
498     353  2010  My Beautiful Dark Twisted Fantasy       Kanye West   
499     381  2011                 The Smile Sessions   The Beach Boys   

                     Genre                             Subgenre  
0                Jazz, 










<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
Using a parenthesis or a "()" with describe will give us the statistical summary or statistical values for the dataframe


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>



In [33]:
data2.describe()


Unnamed: 0,Number,Year
count,500.0,500.0
mean,250.5,1979.27
std,144.481833,12.093701
min,1.0,1955.0
25%,125.75,1970.0
50%,250.5,1976.0
75%,375.25,1988.0
max,500.0,2011.0













<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
To get total number of columns
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>



In [34]:
data2.count()

Number      500
Year        500
Album       500
Artist      500
Genre       500
Subgenre    500
dtype: int64

In [35]:
data3.count()

row_id        848835
time          848835
x             848835
y             848835
direction     848835
congestion    848835
dtype: int64

# 10.) GroupBy Function : 


















<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
Group By Function : 
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>


</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>



In [36]:

# Create a sample DataFrame
data = {
    'Group': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Value': [1, 2, 3, 4, 5, 6]
}

# converting the Dictionary into a dataframe
df = pd.DataFrame(data)

# Calculate the mean value for each group and assign it to a new column
df['GroupMean'] = df.groupby('Group')['Value'].transform('mean')

print(df)


  Group  Value  GroupMean
0     A      1        3.0
1     B      2        4.0
2     A      3        3.0
3     B      4        4.0
4     A      5        3.0
5     B      6        4.0


# 11.) crosstab() : 













<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
crosstab()
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>




The crosstab() function computes a cross-tabulation table, which is a frequency table that shows the distribution of one or more variables over multiple dimensions. It can be handy for analyzing categorical data.

In [37]:


# Create a sample DataFrame
data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Value': [1, 2, 3, 4, 5, 6]
}
# converting the Dictionary into a dataframe
df = pd.DataFrame(data)

# Compute a cross-tabulation table
cross_tab = pd.crosstab(df['Category'], df['Value'])
print(cross_tab)


Value     1  2  3  4  5  6
Category                  
A         1  0  1  0  1  0
B         0  1  0  1  0  1


# 12.) where(): 

<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
where()
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>




The where() function is used to replace values in a DataFrame or Series based on a condition. It can be used to selectively replace values without altering the DataFrame structure.

In [38]:
import numpy as np

# Create a sample DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [6, 7, 8, 9, 10]
}

# converting the Dictionary into a dataframe
df = pd.DataFrame(data)

# Replace values in column 'B' where the value in column 'A' is less than 3
df['B'] = df['B'].where(df['A'] < 3, np.nan)
print(df)


   A    B
0  1  6.0
1  2  7.0
2  3  NaN
3  4  NaN
4  5  NaN


# THANK YOU :



<p style="padding: 10px; background-color: #000000; margin: 0; color: ORANGE; font-family: New Times Roman; font-size: 135%; text-align: center; border-radius: 5px; overflow: hidden; font-weight: 500;">
THANK YOU . 
</p>

<p style="text-align: center;">
    <!-- Content related to the bar chart can be added here -->
</p>




    I hope you all learnt something or atleast it was a revision for most of you users , I would love your opinions on this let me know , If I am missing out something in this notebook , And do upvote as it keeps me motivated and helps me to perform and provide better :
    
    THANKING YOU IN ANTICIPATON;

![Alt Text](https://media.giphy.com/media/9V8You0A1G64JmiBUi/giphy.gif)



<iframe src="https://giphy.com/embed/9V8You0A1G64JmiBUi" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/9V8You0A1G64JmiBUi">via GIPHY</a></p>