# Task 15: Data Wrangling: Join, Combine, and Reshape.

## Data Wrangling

What is 'Data Wrangling' ? 

Data Wrangling is the process of transforming and consolidating data from various sources into a format that is suitable for analysis. It involves tasks such as cleaning, filtering, merging, aggregating, reshaping, and transforming data.

## Importing Library

In [25]:
import pandas as pd
import numpy as np



## Importing Data

In [26]:
gameData1 = pd.read_csv("computer_games.csv")
gameData2 = pd.read_csv("games.csv")


## Exploring Data

In [52]:
# gameData1 Columns
print("\ngameData1 Columns\n")
print(gameData1.columns)

# gameData1 Head
print("\ngameData1 Head\n")
print(gameData1.head())

# gameData1 Tail
print("\ngameData1 Tail\n")
print(gameData1.tail())

# gameData1 Types
print("\ngameData1 Types\n")
print(gameData1.dtypes)

# gameData1 Shape
print("\ngameData1 Shape\n")
print(gameData1.shape)

# gameData1 Summary
print("\ngameData1 Summary\n")
print(gameData1.describe())

# gameData1 Info
print("\ngameData1 Info\n")
print(gameData1.info())


# Finding if there are missing values
print("\nMissing Values\n")
print(gameData1.isnull().sum())



# gameData2 Columns
print("\ngameData2 Columns\n")
gameData2.rename(columns={"Title":"Name"},inplace=True)
gameData2.rename(columns={"Release Date":"Date Released"},inplace=True)
print(gameData2.columns)

# gameData2 Head
print("\ngameData2 Head\n")
print(gameData2.head())

# gameData2 Tail
print("\ngameData2 Tail\n")
print(gameData2.tail())

# gameData2 Types
print("\ngameData2 Types\n")
print(gameData2.dtypes)

# gameData2 Shape
print("\ngameData2 Shape\n")
print(gameData2.shape)

# gameData2 Summary
print("\ngameData2 Summary\n")
print(gameData2.describe())

# gameData2 Info
print("\ngameData2 Info\n")
print(gameData2.info())


# Finding if there are missing values
print("\nMissing Values\n")
print(gameData2.isnull().sum())








gameData1 Columns

Index(['Name', 'Developer', 'Producer', 'Genre', 'Operating System',
       'Date Released'],
      dtype='object')

gameData1 Head

                                 Name                        Developer  \
0                             A-Men 2                     Bloober Team   
1                             A-Train                          Artdink   
2                          A-10 Cuba!              Parsoft Interactive   
3                           A.D. 2044                  R.M.P. Software   
4  A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   

                          Producer                    Genre  \
0                     Bloober Team        Adventure, Puzzle   
1   Artdink, Maxis, Ocean Software  Vehicle Simulation Game   
2                       Activision         Flight simulator   
3                        LK Avalon                Adventure   
4  Columbia Healthcare Corporation              Educational   

    Operating System   

<h2>1. Merge two DataFrames on a single key.</h2>

**Merging DataFrame on a column called "Name"**

In [53]:
# Making Copy Of Data
dataFrame = gameData1.copy()

# Making 2 DataFrames
dataFrame1 = dataFrame[["Name","Developer","Genre"]]
dataFrame2 = dataFrame[["Name",'Producer','Operating System','Date Released']]

# Merging DataFrames
mergedDataFrame = dataFrame1.merge(dataFrame2,on="Name")

print(mergedDataFrame)

                                    Name                        Developer  \
0                                A-Men 2                     Bloober Team   
1                                A-Train                          Artdink   
2                             A-10 Cuba!              Parsoft Interactive   
3                              A.D. 2044                  R.M.P. Software   
4     A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   
...                                  ...                              ...   
1092                688(I) Hunter/Killer                   Sonalysts Inc.   
1093                            7 Colors                       Gamos Ltd.   
1094                                7554                      Emobi Games   
1095                          7th Legion           Vision, Epic MegaGames   
1096                  9: The Last Resort              Tribeca Interactive   

                        Genre                         Producer  \
0        

<h2>2. Merge two DataFrames on multiple keys.</h2>

**Merging DataFrames on multiple columns called "Name" and "Date Released"**


In [54]:
# Using Above Copy Of Data

# Making 2 DataFrames
dataFrame1 = dataFrame[['Name', 'Developer', 'Date Released']]
dataFrame2 = dataFrame[['Name', 'Producer', 'Operating System', 'Date Released','Genre']]

#Merging DataFrames
mergedDataFrame = dataFrame1.merge(dataFrame2, on=['Name', 'Date Released'])
print(mergedDataFrame)


                                    Name                        Developer  \
0                                A-Men 2                     Bloober Team   
1                                A-Train                          Artdink   
2                             A-10 Cuba!              Parsoft Interactive   
3                              A.D. 2044                  R.M.P. Software   
4     A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   
...                                  ...                              ...   
1090                688(I) Hunter/Killer                   Sonalysts Inc.   
1091                            7 Colors                       Gamos Ltd.   
1092                                7554                      Emobi Games   
1093                          7th Legion           Vision, Epic MegaGames   
1094                  9: The Last Resort              Tribeca Interactive   

           Date Released                         Producer  \
0          Jun

<h2>3. Perform an outer join, inner join, left join, and right join.</h2>

What is *join* ? 

 a *join* is an operation that combines columns from two or more tables based on a related column between them. This operation is commonly used to merge DataFrames or tables in order to perform analyses that require data from multiple sources.

 Types of Joins: 
- inner join
- outer join
- left join
- right join 

each serving different purposes based on how you want to combine the data.

1. Inner Join 
Returns only the rows that have matching values in both DataFrames.

2. Outer Join
Returns all the rows from both DataFrames, with NaN where there is no match.

3. Left Join
Returns all rows from the left DataFrame and the matching rows from the right DataFrame. Rows from the left DataFrame without a match in the right DataFrame get NaN for columns from the right DataFrame.

4. Right Join
Returns all rows from the right DataFrame and the matching rows from the left DataFrame. Rows from the right DataFrame without a match in the left DataFrame get NaN for columns from the left DataFrame.



<h3> Inner Join</h3>

In [37]:
dataFrame1 = gameData1.copy()
dataFrame2 = gameData2.copy()

# Inner Join
joinedDataFrame = pd.merge(dataFrame1, dataFrame2,on='Name',how='inner')

print(joinedDataFrame)

# It just returns those rows that are matching in both dataframe by name



                          Name              Developer  \
0                A Hat in Time    Gears for Breakfast   
1                    Alan Wake   Remedy Entertainment   
2    Amnesia: The Dark Descent       Frictional Games   
3    Amnesia: The Dark Descent       Frictional Games   
4                 Apex Legends  Respawn Entertainment   
..                         ...                    ...   
288        Yakuza 3 Remastered                   SEGA   
289              Yakuza Kiwami                   SEGA   
290              Yakuza Kiwami                   SEGA   
291              Yakuza Kiwami                   SEGA   
292      Yakuza: Like a Dragon   Ryu ga Gotoku Studio   

                                              Producer  \
0                                  Gears for Breakfast   
1    Microsoft Game Studios, Remedy Entertainment, ...   
2                                     Frictional Games   
3                                     Frictional Games   
4                        

<h3> Outer Join</h3>

In [42]:
dataFrame1 = gameData1.copy()
dataFrame2 = gameData2.copy()

# Outer Join
joinedDataFrame = pd.merge(dataFrame1, dataFrame2,on='Name',how='outer')

print(joinedDataFrame)

#It returns those rows that are unique in either dataframe by name.





                        Name            Developer                 Producer  \
0                     0 A.D.       Wildfire Games           Wildfire Games   
1                007 Legends              Eurocom               Activision   
2       007: Licence to Kill               Quixel                   Domark   
3     007: Quantum of Solace     Treyarch, Beenox  Activision, Square Enix   
4         1-0 Soccer Manager     New Era Software             Wizard Games   
...                      ...                  ...                      ...   
2415                kkrieger         .theprodukkt             .theprodukkt   
2416                    osu!         Dean Herbert     PPY Developments PTY   
2417             ÜberSoldier  Burut Creative Team      Burut Creative Team   
2418                Ōkami HD                  NaN                      NaN   
2419                Ōkami HD                  NaN                      NaN   

                     Genre                 Operating System  \


<h3> Left Join</h3>

In [55]:
dataFrame1 = gameData1.copy()
dataFrame2 = gameData2.copy()

# Left Join
joinedDataFrame = pd.merge(dataFrame1, dataFrame2,on='Name',how='left')


print(joinedDataFrame)

#It returns all the rows of the left dataframe and includes matching rows from the right dataframe based on the 'Name' column.







                                    Name                        Developer  \
0                                A-Men 2                     Bloober Team   
1                                A-Train                          Artdink   
2                             A-10 Cuba!              Parsoft Interactive   
3                              A.D. 2044                  R.M.P. Software   
4     A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   
...                                  ...                              ...   
1195                688(I) Hunter/Killer                   Sonalysts Inc.   
1196                            7 Colors                       Gamos Ltd.   
1197                                7554                      Emobi Games   
1198                          7th Legion           Vision, Epic MegaGames   
1199                  9: The Last Resort              Tribeca Interactive   

                             Producer                    Genre  \
0        

<h3> Right Join</h3>

In [56]:
dataFrame1 = gameData1.copy()
dataFrame2 = gameData2.copy()

# Right Join
joinedDataFrame = pd.merge(dataFrame1, dataFrame2,on='Name',how='right')

print(joinedDataFrame)

#It returns all the rows of the right dataframe and includes matching rows from the left dataframe based on the 'Name' column.







                                         Name    Developer     Producer  \
0                                  Elden Ring          NaN          NaN   
1                                       Hades          NaN          NaN   
2     The Legend of Zelda: Breath of the Wild          NaN          NaN   
3                                   Undertale     Toby Fox     Toby Fox   
4                               Hollow Knight  Team Cherry  Team Cherry   
...                                       ...          ...          ...   
1508             Back to the Future: The Game          NaN          NaN   
1509                        Team Sonic Racing          NaN          NaN   
1510                           Dragon's Dogma       Capcom       Capcom   
1511                          Baldur's Gate 3          NaN          NaN   
1512                 The LEGO Movie Videogame          NaN          NaN   

                                    Genre  \
0                                     NaN   
1        

<h2>4. Concatenate two DataFrames along rows.</h2>

In [57]:
dataFrame1 = gameData1.copy()
dataFrame2 = gameData2.copy()

# Concatenate DataFrames along rows
concatenatedDataFrame = pd.concat([dataFrame1, dataFrame2],axis=0)

print(concatenatedDataFrame.head())

                                 Name                        Developer  \
0                             A-Men 2                     Bloober Team   
1                             A-Train                          Artdink   
2                          A-10 Cuba!              Parsoft Interactive   
3                           A.D. 2044                  R.M.P. Software   
4  A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   

                          Producer                    Genre  \
0                     Bloober Team        Adventure, Puzzle   
1   Artdink, Maxis, Ocean Software  Vehicle Simulation Game   
2                       Activision         Flight simulator   
3                        LK Avalon                Adventure   
4  Columbia Healthcare Corporation              Educational   

    Operating System      Date Released Team  Rating Times Listed  \
0  Microsoft Windows      June 24, 2015  NaN     NaN          NaN   
1       Windows, Mac               198

<h2>5. Concatenate two DataFrames along columns.</h2>

In [61]:
dataFrame1 = gameData1.copy()
dataFrame2 = gameData2.copy()

# Concatenate DataFrames along rows
concatenatedDataFrame = pd.concat([dataFrame1, dataFrame2],axis=1)

print(concatenatedDataFrame.head())

                                 Name                        Developer  \
0                             A-Men 2                     Bloober Team   
1                             A-Train                          Artdink   
2                          A-10 Cuba!              Parsoft Interactive   
3                           A.D. 2044                  R.M.P. Software   
4  A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   

                          Producer                    Genre  \
0                     Bloober Team        Adventure, Puzzle   
1   Artdink, Maxis, Ocean Software  Vehicle Simulation Game   
2                       Activision         Flight simulator   
3                        LK Avalon                Adventure   
4  Columbia Healthcare Corporation              Educational   

    Operating System      Date Released  \
0  Microsoft Windows      June 24, 2015   
1       Windows, Mac               1985   
2       Windows, Mac  November 30, 1996   
3  M

<h2>6. Concatenate a list of DataFrames.</h2>

In [66]:
# Making 4 DataFrame with different columns

dataFrame1 = gameData1[['Name', 'Developer', 'Genre']].copy()
dataFrame2 = gameData1[['Producer', 'Date Released', 'Operating System']].copy()
dataFrame3 = gameData2[['Name', 'Summary', 'Genres']].copy()
dataFrame4 = gameData2[['Team', 'Date Released', 'Rating']].copy()


# Concatenate DataFrames along rows

concatenatedDataFrame = pd.concat([dataFrame1, dataFrame2, dataFrame3, dataFrame4], axis=0)

print(concatenatedDataFrame.head())


                                 Name                        Developer  \
0                             A-Men 2                     Bloober Team   
1                             A-Train                          Artdink   
2                          A-10 Cuba!              Parsoft Interactive   
3                           A.D. 2044                  R.M.P. Software   
4  A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   

                     Genre Producer Date Released Operating System Summary  \
0        Adventure, Puzzle      NaN           NaN              NaN     NaN   
1  Vehicle Simulation Game      NaN           NaN              NaN     NaN   
2         Flight simulator      NaN           NaN              NaN     NaN   
3                Adventure      NaN           NaN              NaN     NaN   
4              Educational      NaN           NaN              NaN     NaN   

  Genres Team  Rating  
0    NaN  NaN     NaN  
1    NaN  NaN     NaN  
2    NaN  NaN 

<h2>7. Reshape data using the melt function to go from wide to long format.</h2>

What does melt() funciton do ?

The melt() function takes multiple columns and condenses them into key-value pairs, making the data more accessible and easier to work with.

In [67]:
dataFrame1 = gameData1.copy()

# Reshape DataFrame from wide to long format

meltedDataFrame = pd.melt(dataFrame1, id_vars=['Name', 'Developer', 'Genre'], var_name='Attribute', value_name='Value')

print(meltedDataFrame.head())

# It converts the columns into rows and values into columns.

                                 Name                        Developer  \
0                             A-Men 2                     Bloober Team   
1                             A-Train                          Artdink   
2                          A-10 Cuba!              Parsoft Interactive   
3                           A.D. 2044                  R.M.P. Software   
4  A.D.A.M. Life's Greatest Mysteries  Columbia Healthcare Corporation   

                     Genre Attribute                            Value  
0        Adventure, Puzzle  Producer                     Bloober Team  
1  Vehicle Simulation Game  Producer   Artdink, Maxis, Ocean Software  
2         Flight simulator  Producer                       Activision  
3                Adventure  Producer                        LK Avalon  
4              Educational  Producer  Columbia Healthcare Corporation  
