<a href="https://colab.research.google.com/github/ayesha-119/Deep-Learning-BWF/blob/master/Task_16.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🔴 **Task 16** 
Topics: Combining and Merging Datasets, Reshaping Data

Resource: https://drive.google.com/file/d/1ILp88f3u_KgJ_nlhTSsAVGdmZlICGhhd/view?usp=share_link


# **Combining and Merging Datasets**

**Concatenating DataFrames**

To concatenate two or more DataFrames vertically (i.e., row-wise), you can use the pd.concat() function. Here's an example:

In [1]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']})

df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']})

# Concatenate the DataFrames vertically
result = pd.concat([df1, df2])

print(result)

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3
0  A4  B4  C4  D4
1  A5  B5  C5  D5
2  A6  B6  C6  D6
3  A7  B7  C7  D7


To reset the index, you can use the reset_index() method:

In [2]:
result = pd.concat([df1, df2]).reset_index(drop=True)

print(result)

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3
4  A4  B4  C4  D4
5  A5  B5  C5  D5
6  A6  B6  C6  D6
7  A7  B7  C7  D7


**Merging DataFrames**

To merge two or more DataFrames based on a common column or index, you can use the pd.merge() function. Here's an example:

In [4]:
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})

df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']})

# Merge the DataFrames based on the 'key' column
merged_df = pd.merge(df1, df2, on='key')

#Print the merged DataFrame
print(merged_df)

  key   A   B   C   D
0  K0  A0  B0  C0  D0
1  K1  A1  B1  C1  D1
2  K2  A2  B2  C2  D2
3  K3  A3  B3  C3  D3


# **Reshaping Data**

Reshaping data involves transforming the layout of a dataset to better suit a particular analysis or visualization. This can involve pivoting or melting the data, or changing the shape of multi-dimensional arrays. In Python's Pandas library, there are several functions that can be used for reshaping data, including **pivot, melt, stack, and unstack.**

**Loading the Data**

In [8]:
import pandas as pd

# Load the dataset
gapminder = pd.read_csv("https://raw.githubusercontent.com/zief0002/miniature-garbanzo/main/data/gapminder.csv")
gapminder.head()

Unnamed: 0,country,region,income,income_level,life_exp,co2,co2_change,population
0,Afghanistan,Asia,2.03,Level 1,62.7,0.254,increase,37.2
1,Albania,Europe,13.3,Level 3,78.4,1.59,increase,2.88
2,Algeria,Africa,11.6,Level 3,76.0,3.69,increase,42.2
3,Andorra,Europe,58.3,Level 4,82.1,6.12,decrease,0.077
4,Angola,Africa,6.93,Level 2,64.6,1.12,decrease,30.8


**Wide to Long Format**

One common way to reshape data is to convert it from a wide format to a long format. In the wide format, each row represents a single observation, and each column represents a variable. In the long format, each row represents a unique combination of variables, and there may be multiple rows for each observation.

In [12]:
# Convert from wide to long format
gapminder_long = pd.melt(gapminder, id_vars=["region", "country"],  var_name='variable', value_name='value')

# Preview the result
gapminder_long.head()

Unnamed: 0,region,country,variable,value
0,Asia,Afghanistan,income,2.03
1,Europe,Albania,income,13.3
2,Africa,Algeria,income,11.6
3,Europe,Andorra,income,58.3
4,Africa,Angola,income,6.93


**Long to Wide Format**


We can also convert data from long to wide format. This can be useful when we want to compare multiple groups over time or when we want to summarize data by category. In this case, we will use the pivot() function to convert the data from long to wide format.

In [13]:
import pandas as pd

# Load the Gapminder dataset
gapminder = pd.read_csv('https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv')

# Convert the data from long to wide format
wide_gapminder = gapminder.pivot(index='country', columns='year', values='lifeExp')

# Display the first five rows of the new DataFrame
wide_gapminder.head()

year,1952,1957,1962,1967,1972,1977,1982,1987,1992,1997,2002,2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Afghanistan,28.801,30.332,31.997,34.02,36.088,38.438,39.854,40.822,41.674,41.763,42.129,43.828
Albania,55.23,59.28,64.82,66.22,67.69,68.93,70.42,72.0,71.581,72.95,75.651,76.423
Algeria,43.077,45.685,48.303,51.407,54.518,58.014,61.368,65.799,67.744,69.152,70.994,72.301
Angola,30.015,31.999,34.0,35.985,37.928,39.483,39.942,39.906,40.647,40.963,41.003,42.731
Argentina,62.485,64.399,65.142,65.634,67.065,68.481,69.942,70.774,71.868,73.275,74.34,75.32


**stacking:** 

This method pivots the columns of a dataframe into rows, creating a multi-level index dataframe with the original columns and the new rows.

In [14]:
import pandas as pd

# Create a multi-level index dataframe
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
                   'B': ['one', 'one', 'two', 'two', 'one', 'one'],
                   'C': ['x', 'y', 'x', 'y', 'x', 'y'],
                   'D': [1, 2, 3, 4, 5, 6],
                   'E': [7, 8, 9, 10, 11, 12]})

df.set_index(['A', 'B', 'C'], inplace=True)

print(df)


           D   E
A   B   C       
foo one x  1   7
        y  2   8
    two x  3   9
bar two y  4  10
    one x  5  11
        y  6  12


In [15]:
stacked_df = df.stack()

print(stacked_df)

A    B    C   
foo  one  x  D     1
             E     7
          y  D     2
             E     8
     two  x  D     3
             E     9
bar  two  y  D     4
             E    10
     one  x  D     5
             E    11
          y  D     6
             E    12
dtype: int64


**unstacking:** 

This method pivots the rows of a multi-level index dataframe into columns, creating a new dataframe with the original columns and the new columns created from the rows.

In [16]:
unstacked_df = stacked_df.unstack()

print(unstacked_df)

           D   E
A   B   C       
bar one x  5  11
        y  6  12
    two y  4  10
foo one x  1   7
        y  2   8
    two x  3   9
