# Parallel between SQL & Pandas
> Introduction to Pandas by drawing parallel between SQL & Pandas

- toc: true 
- badges: true
- comments: true
- categories: [pandas,sql]
- image: images/chart-preview.png

In [3]:
import pandas as pd

In [4]:
movies = 'https://vega.github.io/vega-datasets/data/movies.json'

## SELECT
In SQL, SELECT is the operation which allows you to explore and manipulate data.

### Select all columns
```sql
SELECT TOP 5 * FROM movies
```

In [9]:
df = pd.read_json(movies) # load movies data
df.head() # Shows top 5 rows by default, you can pass a value asking more than 5 rows like so - df.head(10)

Unnamed: 0,Title,US Gross,Worldwide Gross,US DVD Sales,Production Budget,Release Date,MPAA Rating,Running Time min,Distributor,Source,Major Genre,Creative Type,Director,Rotten Tomatoes Rating,IMDB Rating,IMDB Votes
0,The Land Girls,146083.0,146083.0,,8000000.0,Jun 12 1998,R,,Gramercy,,,,,,6.1,1071.0
1,"First Love, Last Rites",10876.0,10876.0,,300000.0,Aug 07 1998,R,,Strand,,Drama,,,,6.9,207.0
2,I Married a Strange Person,203134.0,203134.0,,250000.0,Aug 28 1998,,,Lionsgate,,Comedy,,,,6.8,865.0
3,Let's Talk About Sex,373615.0,373615.0,,300000.0,Sep 11 1998,,,Fine Line,,Comedy,,,13.0,,
4,Slam,1009819.0,1087521.0,,1000000.0,Oct 09 1998,R,,Trimark,Original Screenplay,Drama,Contemporary Fiction,,62.0,3.4,165.0


### Select few columns
```sql
SELECT TOP 5 Title, [IMDB Rating] FROM movies
```

In [7]:
df[['Title','IMDB Rating']].head()

Unnamed: 0,Title,IMDB Rating
0,The Land Girls,6.1
1,"First Love, Last Rites",6.9
2,I Married a Strange Person,6.8
3,Let's Talk About Sex,
4,Slam,3.4


## WHERE
```sql
SELECT * FROM movies WHERE [IMDB Rating] > 5
```


In [10]:
df[df['IMDB Rating'] > 5].head()

Unnamed: 0,Title,US Gross,Worldwide Gross,US DVD Sales,Production Budget,Release Date,MPAA Rating,Running Time min,Distributor,Source,Major Genre,Creative Type,Director,Rotten Tomatoes Rating,IMDB Rating,IMDB Votes
0,The Land Girls,146083.0,146083.0,,8000000.0,Jun 12 1998,R,,Gramercy,,,,,,6.1,1071.0
1,"First Love, Last Rites",10876.0,10876.0,,300000.0,Aug 07 1998,R,,Strand,,Drama,,,,6.9,207.0
2,I Married a Strange Person,203134.0,203134.0,,250000.0,Aug 28 1998,,,Lionsgate,,Comedy,,,,6.8,865.0
6,Following,44705.0,44705.0,,6000.0,Apr 04 1999,R,,Zeitgeist,,,,Christopher Nolan,,7.7,15133.0
8,Pirates,1641825.0,6341825.0,,40000000.0,Jul 01 1986,R,,,,,,Roman Polanski,25.0,5.8,3275.0


### WHERE - Logical operator AND
```sql
SELECT * FROM movies WHERE [IMDB Rating] > 5 AND [MPAA Rating] = 'PG'
```

In [11]:
df[(df['IMDB Rating'] > 5) & (df['MPAA Rating']=='PG')].head()

Unnamed: 0,Title,US Gross,Worldwide Gross,US DVD Sales,Production Budget,Release Date,MPAA Rating,Running Time min,Distributor,Source,Major Genre,Creative Type,Director,Rotten Tomatoes Rating,IMDB Rating,IMDB Votes
21,1776,0.0,0.0,,4000000.0,Nov 09 1972,PG,,Sony/Columbia,Based on Play,Drama,Historical Fiction,,57.0,7.0,4099.0
59,The Adventures of Huck Finn,24103594.0,24103594.0,,6500000.0,Apr 02 1993,PG,,Walt Disney Pictures,Based on Book/Short Story,Adventure,Historical Fiction,Stephen Sommers,62.0,5.8,3095.0
67,Around the World in 80 Days,42000000.0,42000000.0,,6000000.0,Oct 17 1956,PG,,United Artists,Based on Book/Short Story,Adventure,,,73.0,5.6,21516.0
108,The Blue Butterfly,1610194.0,1610194.0,,10400000.0,Feb 20 2004,PG,,Alliance,Original Screenplay,Drama,Contemporary Fiction,,44.0,6.2,817.0
140,The Basket,609042.0,609042.0,,1300000.0,May 05 2000,PG,,MGM,Original Screenplay,Drama,,,,6.3,343.0


### WHERE - Logical operator OR
```sql
SELECT * FROM movies WHERE [MPAA Rating] = "PG" OR [MPAA Rating] = "PG-13"
```

In [17]:
df[(df['MPAA Rating'] == 'PG') | (df['MPAA Rating'] == 'PG-13')].head()

Unnamed: 0,Title,US Gross,Worldwide Gross,US DVD Sales,Production Budget,Release Date,MPAA Rating,Running Time min,Distributor,Source,Major Genre,Creative Type,Director,Rotten Tomatoes Rating,IMDB Rating,IMDB Votes
21,1776,0.0,0.0,,4000000.0,Nov 09 1972,PG,,Sony/Columbia,Based on Play,Drama,Historical Fiction,,57.0,7.0,4099.0
31,3 Ninjas Kick Back,11744960.0,11744960.0,,20000000.0,May 06 1994,PG,,Walt Disney Pictures,Original Screenplay,Action,Contemporary Fiction,,17.0,3.2,3107.0
41,The Abyss,54243125.0,54243125.0,,70000000.0,Aug 09 1989,PG-13,,20th Century Fox,Original Screenplay,Action,Science Fiction,James Cameron,88.0,7.6,51018.0
43,Ace Ventura: Pet Detective,72217396.0,107217396.0,,12000000.0,Feb 04 1994,PG-13,,Warner Bros.,Original Screenplay,Comedy,Contemporary Fiction,Tom Shadyac,49.0,6.6,63543.0
44,Ace Ventura: When Nature Calls,108360063.0,212400000.0,,30000000.0,Nov 10 1995,PG-13,,Warner Bros.,Original Screenplay,Comedy,Contemporary Fiction,Steve Oedekerk,,5.6,51275.0


### WHERE - Logical operator NOT
```sql
SELECT * FROM movies WHERE [Rotten Tomatoes Rating] IS NOT NULL
```

In [22]:
df[~df['Rotten Tomatoes Rating'].isnull()].head()

Unnamed: 0,Title,US Gross,Worldwide Gross,US DVD Sales,Production Budget,Release Date,MPAA Rating,Running Time min,Distributor,Source,Major Genre,Creative Type,Director,Rotten Tomatoes Rating,IMDB Rating,IMDB Votes
3,Let's Talk About Sex,373615.0,373615.0,,300000.0,Sep 11 1998,,,Fine Line,,Comedy,,,13.0,,
4,Slam,1009819.0,1087521.0,,1000000.0,Oct 09 1998,R,,Trimark,Original Screenplay,Drama,Contemporary Fiction,,62.0,3.4,165.0
8,Pirates,1641825.0,6341825.0,,40000000.0,Jul 01 1986,R,,,,,,Roman Polanski,25.0,5.8,3275.0
9,Duel in the Sun,20400000.0,20400000.0,,6000000.0,Dec 31 2046,,,,,,,,86.0,7.0,2906.0
10,Tom Jones,37600000.0,37600000.0,,1000000.0,Oct 07 1963,,,,,,,,81.0,7.0,4035.0


## GROUP BY
```sql
SELECT [Major Genre],COUNT(*) FROM movies ORDER BY 2 DESC
```

In [27]:
df.groupby('Major Genre').size().sort_values(0,ascending=False)

  df.groupby('Major Genre').size().sort_values(0,ascending=False)


Major Genre
Drama                  789
Comedy                 675
Action                 420
Adventure              274
Thriller/Suspense      239
Horror                 219
Romantic Comedy        137
Musical                 53
Documentary             43
Black Comedy            36
Western                 36
Concert/Performance      5
dtype: int64

## JOIN
JOINS are probably one of the most important operation in SQL, it allows you to combine two or more tables and perform operations on resulting dataset

In [31]:
data = [['Slam',5],['Pirates',8],['Duel in the Sun',7]]
df_favmovies = pd.DataFrame(data, columns=['my_fav_movies','my_rating'])

In [32]:
df_favmovies.head()

Unnamed: 0,my_fav_movies,my_rating
0,Slam,5
1,Pirates,8
2,Duel in the Sun,7


### INNER JOIN
```sql
SELECT m1.*,m2.*
FROM movies m1
INNER JOIN my_fav_movies m2
ON m1.Title = m2.my_fav_movies
```

In [33]:
pd.merge(df,df_favmovies, left_on='Title', right_on='my_fav_movies', how='inner')

Unnamed: 0,Title,US Gross,Worldwide Gross,US DVD Sales,Production Budget,Release Date,MPAA Rating,Running Time min,Distributor,Source,Major Genre,Creative Type,Director,Rotten Tomatoes Rating,IMDB Rating,IMDB Votes,my_fav_movies,my_rating
0,Slam,1009819.0,1087521.0,,1000000.0,Oct 09 1998,R,,Trimark,Original Screenplay,Drama,Contemporary Fiction,,62.0,3.4,165.0,Slam,5
1,Pirates,1641825.0,6341825.0,,40000000.0,Jul 01 1986,R,,,,,,Roman Polanski,25.0,5.8,3275.0,Pirates,8
2,Duel in the Sun,20400000.0,20400000.0,,6000000.0,Dec 31 2046,,,,,,,,86.0,7.0,2906.0,Duel in the Sun,7


### LEFT OUTER JOIN
```sql
SELECT m1.*,m2.*
FROM movies m1
LEFT OUTER JOIN my_fav_movies m2
ON m1.Title = m2.my_fav_movies
```

In [35]:
pd.merge(df,df_favmovies, left_on='Title', right_on='my_fav_movies', how='left').head()

Unnamed: 0,Title,US Gross,Worldwide Gross,US DVD Sales,Production Budget,Release Date,MPAA Rating,Running Time min,Distributor,Source,Major Genre,Creative Type,Director,Rotten Tomatoes Rating,IMDB Rating,IMDB Votes,my_fav_movies,my_rating
0,The Land Girls,146083.0,146083.0,,8000000.0,Jun 12 1998,R,,Gramercy,,,,,,6.1,1071.0,,
1,"First Love, Last Rites",10876.0,10876.0,,300000.0,Aug 07 1998,R,,Strand,,Drama,,,,6.9,207.0,,
2,I Married a Strange Person,203134.0,203134.0,,250000.0,Aug 28 1998,,,Lionsgate,,Comedy,,,,6.8,865.0,,
3,Let's Talk About Sex,373615.0,373615.0,,300000.0,Sep 11 1998,,,Fine Line,,Comedy,,,13.0,,,,
4,Slam,1009819.0,1087521.0,,1000000.0,Oct 09 1998,R,,Trimark,Original Screenplay,Drama,Contemporary Fiction,,62.0,3.4,165.0,Slam,5.0
