# Objectives

At the end of the experiment you will able to

* perform data cleaning, manipulation and pre-processing using Pandas

## Dataset

In this Experiment you will be using movies dataset. The dataset contains 77 rows and 8 columns. The columns are as follows:

* Film
* Genre
     - Romance, Comedy, Drama, Animation, Fantasy, Action
* Lead Studio
* Audience score in percentage
* Profitability
* Rotten Tomatoes in percentage
* Worldwide Gross
* Year

## Features of Pandas

* Fast and efficient DataFrame object with default and customized indexing.
* Tools for loading data into in-memory data objects from different file formats.
* Data alignment and integrated handling of missing data.
* Reshaping and pivoting of date sets.
* Label-based slicing, indexing and subsetting of large data sets.
* Columns from a data structure can be deleted or inserted.
* Group by data for aggregation and transformations.
* High performance merging and joining of data.
* Time Series functionality.

Now it is time to work on practicals. Following Are the given Exercise:

In [1]:
import pandas as pd

In [2]:
moviesdata = pd.read_csv("movies.csv")

In [3]:
moviesdata.head()

Unnamed: 0,Film,Genre,Lead Studio,Audience score %,Profitability,Rotten Tomatoes %,Worldwide Gross,Year
0,Zack and Miri Make a Porno,Romance,The Weinstein Company,70,1.747542,64,$41.94,2008
1,Youth in Revolt,Comedy,The Weinstein Company,52,1.09,68,$19.62,2010
2,You Will Meet a Tall Dark Stranger,Comedy,Independent,35,1.211818,43,$26.66,2010
3,When in Rome,Comedy,Disney,44,0.0,15,$43.04,2010
4,What Happens in Vegas,Comedy,Fox,72,6.267647,28,$219.37,2008


### Exercises 1: How many movies are listed in the moviesdata dataframe?

In [4]:
len(moviesdata.Film.unique())

75

### Exercises 2: How many movies were made in the year 2009?

In [12]:
len(moviesdata[moviesdata["Year"] == 2009])

12

### Exercises 3: Which year saw the most films released?

In [22]:
for i in range(2):
    print(moviesdata['Year'].mode()[i])

2008
2010


In [None]:
moviesdata[moviesdata["Year"] == 2008].shape

In [19]:
moviesdata[moviesdata["Year"] == 2010].shape

(20, 8)

In [21]:
moviesdata[moviesdata["Year"] == 2007].shape

(11, 8)

In [22]:
moviesdata[moviesdata["Year"] == 2011].shape

(14, 8)

### Exercises 4: How many movies were made from 2010 through 2011?

In [23]:
len(moviesdata[moviesdata["Year"] >= 2010])

34

In [31]:
moviesdata["Genre"].unique()

array(['Romance', 'Comedy', 'Drama', 'Animation', 'Fantasy', 'Romence',
       'Comdy', 'Action', 'romance', 'comedy'], dtype=object)

### Exercises 5: In Genre column, sometimes Comedy is spelled as Comdy, comedy and Romance is spelled as romance, Romence. Replace all the misspelt words with correct words

In [24]:
moviesdata['Genre'].unique()

array(['Romance', 'Comedy', 'Drama', 'Animation', 'Fantasy', 'Romence',
       'Comdy', 'Action', 'romance', 'comedy'], dtype=object)

In [25]:
moviesdata['Genre'] = moviesdata["Genre"].str.replace("Comdy", "Comedy")

In [26]:
moviesdata['Genre'] = moviesdata["Genre"].str.replace("comedy", "Comedy")

In [27]:
moviesdata['Genre'] = moviesdata["Genre"].str.replace("Romence", "Romance")

In [28]:
moviesdata['Genre'] = moviesdata["Genre"].str.replace("romance", "Romance")

In [38]:
moviesdata["Genre"].unique()

array(['Romance', 'Comedy', 'Drama', 'Animation', 'Fantasy', 'Action'],
      dtype=object)

### Exercises 6: How many Comedy Genre movies has release in 2009?

In [29]:
len(moviesdata[(moviesdata["Year"] == 2009) & (moviesdata['Genre'] == "Comedy") ])

7

### Exercises 7: Which year as the most number of Romance Genre Movies releases?

In [33]:
moviesdata[moviesdata['Genre'] == "Romance"]['Year'].mode()[0]

2011

### Exercises 8: How many films are made by Sony Lead Studio?

In [34]:
len(moviesdata[moviesdata['Lead Studio'] == "Sony"])

4

### Exercises 9: How many films are having Audience score less than 50 %?

In [35]:
len(moviesdata[moviesdata['Audience score %'] < 50])

16

### Exercises 10: Display all the movies done by Universal Lead Studio?

In [36]:
moviesdata[moviesdata['Lead Studio'] == "Universal"]

Unnamed: 0,Film,Genre,Lead Studio,Audience score %,Profitability,Rotten Tomatoes %,Worldwide Gross,Year
45,Mamma Mia!,Comedy,Universal,76,9.234454,53,$609.47,2008
46,Mamma Mia!,Comedy,Universal,76,9.234454,53,$609.47,2008
48,Love Happens,Drama,Universal,40,2.004444,18,$36.08,2009
53,Leap Year,Comedy,Universal,49,1.715263,21,$32.59,2010
54,Knocked Up,Comedy,Universal,83,6.636402,91,$219,2007
57,Jane Eyre,Romance,Universal,77,0.0,85,$30.15,2011
58,It's Complicated,Comedy,Universal,63,2.642353,56,$224.60,2009
73,A Serious Man,Drama,Universal,64,4.382857,89,$30.68,2009


In [39]:
moviesdata['Worldwide Gross'] = [float(value.lstrip('$')) for value in moviesdata['Worldwide Gross']]
moviesdata

Unnamed: 0,Film,Genre,Lead Studio,Audience score %,Profitability,Rotten Tomatoes %,Worldwide Gross,Year
0,Zack and Miri Make a Porno,Romance,The Weinstein Company,70,1.747542,64,41.94,2008
1,Youth in Revolt,Comedy,The Weinstein Company,52,1.090000,68,19.62,2010
2,You Will Meet a Tall Dark Stranger,Comedy,Independent,35,1.211818,43,26.66,2010
3,When in Rome,Comedy,Disney,44,0.000000,15,43.04,2010
4,What Happens in Vegas,Comedy,Fox,72,6.267647,28,219.37,2008
...,...,...,...,...,...,...,...,...
72,Across the Universe,Romance,Independent,84,0.652603,54,29.37,2007
73,A Serious Man,Drama,Universal,64,4.382857,89,30.68,2009
74,A Dangerous Method,Drama,Independent,89,0.448645,79,8.97,2011
75,27 Dresses,Comedy,Fox,71,5.343622,40,160.31,2008


In [37]:
def is_hit(row):
    if row['Audience score %'] > 75 and row['Worldwide Gross'] > 500:
        return 'Hit'
    return 'Average'

In [42]:
moviesdata['Hit / Average'] = moviesdata.apply(lambda row: is_hit(row), axis=1)

In [44]:
moviesdata[moviesdata['Hit / Average'] == 'Hit']

Unnamed: 0,Film,Genre,Lead Studio,Audience score %,Profitability,Rotten Tomatoes %,Worldwide Gross,Year,Hit / Average
6,WALL-E,Animation,Disney,89,2.896019,96,521.28,2008,Hit
14,The Twilight Saga: New Moon,Drama,Summit,78,14.1964,27,709.82,2009,Hit
45,Mamma Mia!,Comedy,Universal,76,9.234454,53,609.47,2008,Hit
46,Mamma Mia!,Comedy,Universal,76,9.234454,53,609.47,2008,Hit
