# Introduction to Python Programming
### Dr. Robert G. de Luna, PECE

## A. Welcome to Jupyter Notebook
Take a while to adjust your bearings. Study the icons above.

There are two major types of cells:

1) Markdown cells - simple text. One can do html tags like <b>BOLD</b> or latex like $\beta$.

2) Code cells - cells where we can run code.

<b>Shortcuts</b>

1) <b>CTRL-M</b> then <b>H</b> to see help

2) <b>CTRL-M</b> then <b>S</b> to save notebook

3) <b>CTRL-ENTER</b> to Run Code but stay in the same cell

4) <b>SHIFT-ENTER</b> to Run Code and advance to the next cell

5) You can use <b>TAB</b> to see available functions. You can use <b>SHIFT-TAB</b> repeatedly for the documentation.

6) Using <b>%pylab inline</b> preceeding everything else in the notebook imports already matplotlib and numpy. It also enables our graphics to be part of the notebook.

In [1]:
%pylab inline

%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib


## B. Variables and Data Types

Python uses five standard data types:

### B.1. Numbers

In [53]:
variable_numeric = 123
pi = 3.141592653589793238462643383279502884197169399375105820974944592307816406286

In [54]:
variable_numeric_type = type(variable_numeric)
pi_type = type(pi)

In [55]:
print(variable_numeric_type, pi_type, sep='\n') 

<class 'int'>
<class 'float'>


NOTE: variable_numeric is an **Integer** type, thus, it does not handle numbers with decimal places while pi is a **Float** type where values in the decimal place are handled.

### B.2. Strings

Strings may be declared with a single quote (') or double quote ("), some even use triple double quotes("""). One may use them interchangeable but some prefer to follow a specific format.

In [56]:
varText = 'This is a tring'
varString = "Hello World!" 

In [57]:
print(varText)

This is a tring


In [58]:
print(varString)

Hello World!


In [59]:
print(f"The length of the varString is {len(varString)}")

The length of the varString is 12


In [61]:
print(f"The length of the varText is {len(varText)}")

The length of the varText is 15


In [62]:
age = 18
name = "Angel"

In [63]:
print("My age is",age,"and my name is",name)
print("My age is %d and my name is %s" % (age,name))   
print(f"My age is {age} and my name is {name}")
print("My age is {} and my name is {}".format(age,name))
print("My age is {two} and my name is {one}".format(one = name, two = age))

My age is 18 and my name is Angel
My age is 18 and my name is Angel
My age is 18 and my name is Angel
My age is 18 and my name is Angel
My age is 18 and my name is Angel


### B.3. Lists

In [64]:
varList = ['abc', 123]

In [69]:
print(varList)

['abc', 123]


In [73]:
nested_list = [1, 2, 3, [4, 5 ,["target"]]]

In [74]:
print(nested_list)

[1, 2, 3, [4, 5, ['target']]]


You may obtain the number of elements in a list by calling the <b>len()</b> function

In [80]:
print(f"Variables: {len(nested_list)}")


Variables: 4


In [81]:
ii = 0
for i in nested_list:
    if isinstance(i, list):
        ii += len(i)
    else:
        ii += 1

print(f"All the variables: {ii}")


All the variables: 6


You can think of Lists as similar to ArrayLists where the index starts at 0 and you can obtain the contents of a list by using brackets that contain the index of the element. 

In [82]:
print(varList[0])

abc


In [83]:
print(nested_list[0])  

1


In [84]:
print(nested_list[1])

2


In [85]:
print(nested_list[2])

3


In [86]:
print(nested_list[3])


[4, 5, ['target']]


In [87]:
# What is the code to print the string, target
             # 0  1  2 3 0  1 2 0 
nested_list = [1, 2, 3, [4, 5, ['target']]]

print(nested_list[3][2][0])

target


### B.4. Tuples

In [126]:
varTuple = ("abc", 123, "Hello")

In [127]:
print(varTuple)

('abc', 123, 'Hello')


In [128]:
len(varTuple)

3

In [129]:
print(varTuple[0])

abc


In [130]:
print(varTuple[1])  

123


In [131]:
print(varTuple[2])

Hello


### Comparison Between List and Tuples

You may also append and remove items in the LIST. It shows that when you added a new list item, it would be added towards the end of list.

<b>HINT:</b> You can try to type <b><i>varList.</i></b> in one line as well as <b><i>varTuple.</i></b> and press <b>TAB</b> after the period (.) in order to view possible functions you can call from that variable. You may also try to press <b>CTRL + TAB</b>.

In [132]:
print(varList)

['abc', 123, 'HELLO']


In [133]:
len(varList)

3

In [134]:
# Append the string "HELLO" to the varList using the TAB

In [135]:
varList.append("HELLO")

In [136]:
print(varList)

['abc', 123, 'HELLO', 'HELLO']


In [137]:
varList.remove("HELLO")

In [138]:
print(varList)

['abc', 123, 'HELLO']


In [139]:
# Append the string "GOODBYE" to the varTuple using the TAB

In [140]:
varTuple = varTuple + ("GOODBYE",)

print(varTuple)

('abc', 123, 'Hello', 'GOODBYE')


In [142]:
varTuple = tuple(varList)

It may seem like there are no difference between Tuples and Lists other than Tuples use parenthesis while Lists use brackets, but actually there are minor differences. For one thing, Tuples are fixed structures, thus, it does not have the luxury of Lists to append or remove elements. Generally, Lists have a lot of other functions readily available as opposed to using Tuples.

However Tuples actually use less space in the memory as opposed to Lists, resulting in faster processing. One thing to take note of is that one would usually use Tuples when the size of the contents are static as opposed to Lists where one can use it to continuously modify the size and elements.

In [143]:
print(varList)
print(varList.__sizeof__())
print(varTuple)
print(varTuple.__sizeof__())

['abc', 123, 'HELLO']
104
('abc', 123, 'HELLO')
48


### B.5. Dictionaries

In [144]:
dictionary = {"key1": "Item1", "key2": "Item2"}


In [145]:
print(dictionary)

{'key1': 'Item1', 'key2': 'Item2'}


In [146]:
len(dictionary)

2

In [151]:
print(dictionary["key1"])

Item1


In [152]:
print(dictionary["key2"])

Item2


In [153]:
print(tuple(dictionary.values())[0])


Item1


In [154]:
print(tuple(dictionary.values())[1])

Item2


In [155]:
var = 5
varDictionary = {"First": "2", "2": "2nd", 3: var}

In [156]:
print(varDictionary)

{'First': '2', '2': '2nd', 3: 5}


You may also declare contents of dictionaries individually.

In [159]:
variable_dictionary = {}

variable_dictionary["First"] = 1
variable_dictionary["2"] = "2nd"
variable_dictionary[3] = var
print(variable_dictionary)

{'First': 1, '2': '2nd', 3: 5}


In [161]:
variable = "Hello World!"
variable_dictionar_new = {"First": 123, 2: "abc", "3": variable, 4: ["listA", "listB"]}

In [162]:
print(variable_dictionar_new)

{'First': 123, 2: 'abc', '3': 'Hello World!', 4: ['listA', 'listB']}


In [163]:
print(variable_dictionar_new["First"])

123


In [164]:
print(variable_dictionar_new[2])


abc


In [165]:
print(variable_dictionar_new["3"])

Hello World!


In [166]:
print(variable_dictionar_new[4])

['listA', 'listB']


In [168]:
print(variable_dictionar_new[4][0])

listA


In [170]:
print(variable_dictionar_new[4][1])

listB


## C. Python Arithmetic

Python uses basic arithmetic functions which are normally present on most if not all programming languages.

### C.1. Addition

In [171]:
a = 5 + 3
a

8

### C.2. Subtraction

In [172]:
a = 5 - 3
a

2

### C.3. Multiplication

In [173]:
a = 5 * 3
a

15

### C.4. Exponentiation

In [174]:
a = 5 ** 3
a

125

### C.5. Division

In [175]:
a = 5 / 3
a

1.6666666666666667

### C.6. Modulus Division

In [178]:
a = 5 % 3
a

2

### C.7. Integer Division

In [179]:
a = 5 // 3
a

1

### C.8. Increment

In [180]:
a = 5
a += 1
a

6

### C.9. Decrement

In [181]:
a = 5
a -= 1
a

4

<b>NOTE:</b> Python does not support the increment/decrement syntax of <b>x++/x--</b> instead you may use the syntax of <b>x+=1/x-=1</b> which is similar to <b>x=x+1/x=x-1</b>

### C.10. String Concatenation

In [182]:
a = 'Hello ' + 'World!'
a

'Hello World!'

Strings may also be appended with the use of the plus <b>(+)</b> symbol

In [183]:
a="Hello"
b=" World!"
c=a+b
c

'Hello World!'

### C.11. Complex Expressions

In [185]:
a = 3 + 5 - 6 * 2 / 4
a

5.0

 ### C.12. Comparison Operators

In [186]:
1 > 2

False

In [187]:
1 < 2

True

In [188]:
1 >= 1

True

In [189]:
1 <= 4

True

In [190]:
1 == 1

True

In [191]:
'hi' == 'bye'

False

### C.13. Logical Operators

In [192]:
(1 > 2) and (2 < 3)

False

In [193]:
(1 > 2) or (2 < 3)

True

In [194]:
(1 == 2) or (2 == 3) or (4 == 4)

True

## Challenge! Write the following to code:

$$ g = \frac{1}{1+e^{-z}}  $$

1) z = 8, and e = 2.718 should be equal to 0.9996643717

2) z = 2, and e = 2.718 should be equal to 0.8807753039

<b><i>TRIVIA</i></b>: The value <b>e</b>, also called <b>Euler's Number</b>, is a mathematical constant representing an irrational number that is approximately <b>2.71828</b>. Irrational, meaning the constant <b>e</b> is a real number that is unending and is unable to accurately be represented as a fraction, similar to that of <b>pi</b>.

In [206]:
# Type your Code Here for Requirement Number 1:
z = 8

g = 1 / (1 + e ** (-z))
g


0.9996643716832646

In [207]:
# Type your Code Here for Requirement Number 2:
z = 2

g = 1 / (1 + e ** (-z))
g


0.8807753038918279

## D. Control Statements and Data Structures

## D.A. Conditional Statement

### D.A.1. Boolean Condition

In [209]:
x = True
if x:
    print("var x is True")
else:
    print("var x is False")

var x is True


In [210]:
x = False
if x:
    print("var x is True")
else:
    print("var x is False")

var x is False


### D.A.2. String Condition

In [211]:
x = "Hello World!"

if x == 'Hello World!':
    print("var x is Hello World!")
else:
    print("var x is not Hello World!")

var x is Hello World!


In [212]:
x = "Hello Earth!"

if x == 'Hello World!':
    print("var x is Hello World!")
else:
    print("var x is not Hello World!")

var x is not Hello World!


### D.A.3. Numerical Condition

In [217]:
x = '10'

if x == '10':
    print("var x is a String")
elif x == 10:
    print("var x is an Integer")
else:
    print("var x is none of the above")


var x is a String


In [218]:
x = 10

if x == '10':
    print("var x is a String")
elif x == 10:
    print("var x is an Integer")
else:
    print("var x is none of the above")

var x is an Integer


In [219]:
x = 5

if x == '10':
    print("var x is a String")
elif x == 10:
    print("var x is an Integer")
else:
    print("var x is none of the above")

var x is none of the above


### D.A.4. Multiple Conditions

In [220]:
x = 10

if x > 5 and x < 15 and x == 10:
    print("var x is really 10!")
else:
    print("var x is not really 10")

var x is really 10!


In [221]:
x = 15

if x > 5 and x < 15 and x == 10:
    print("var x is really 10!")
else:
    print("var x is not really 10")

var x is not really 10


In [222]:
x = 10

if x == 10 or x == 20:
    print("var x can be 10 or 20")
else:
    print("var x is not 10 nor 20")

var x can be 10 or 20


In [223]:
x = 20

if x == 10 or x == 20:
    print("var x can be 10 or 20")
else:
    print("var x is not 10 nor 20")

var x can be 10 or 20


In [224]:
x = 12

if x == 10 or x == 20:
    print("var x can be 10 or 20")
else:
    print("var x is not 10 nor 20")

var x is not 10 nor 20


## D.B. Loops

### D.B.1. For Loops

In [225]:
for var in range(0,5,2):
    print(var)

0
2
4


<b>NOTE:</b> The command <b>range(0,5,2)</b> is equivalent to all numbers from 0 incremented by 2 until it reaches the number less than 5

In [226]:
[v for v in range(1, 100, 5)]

[1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76, 81, 86, 91, 96]

<b>NOTE:</b> range([start], [stop], [step])

### D.B.2. While Loops

In [227]:
var = 0
while var < 5:
    print(var)
    var += 2

0
2
4


### D.B.3. Nested Loops

In [228]:
x = 0
while x < 5:
    for y in range(0, x):
        print(y, end='')
    x+=1
    print()


0
01
012
0123


Always take note that there should be a colon <b>(:)</b> on the line where one declares the loop or condition

## E. Data Slicing

<b>NOTE:</b> <i>  list( [start] : [end] )
START is Inclusive and END is Exclusive

In [235]:
varList = [i for i in range(1,11)]

In [236]:
print(varList)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [251]:
varList[:5]

[1, 2, 3, 4, 5]

In [250]:
varList[5:]


6 7 8 9 10


In [256]:
varList[:-2]

[1, 2, 3, 4, 5, 6, 7, 8]

In [257]:
varList[-2:]

[9, 10]

In [263]:
varList[2:8]

[3, 4, 5, 6, 7, 8]

In [272]:
varList[2:8:2]

[3, 5, 7]

In [273]:
varList[-2:-8:-2]

[9, 7, 5]

<b>NOTE:</b> list( [start] : end : [
step] )

## F. Vectors and Matrices
In Python, you can use <b>NUMPY</b> or <b>np</b> through the use of <b>import numpy as np</b> in order to easily use functions for vectors and matrices.

In [274]:
import numpy as np

In [276]:
np.array(range(100))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

It is the same as: 

In [277]:
np.arange(0, 100)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [278]:
print(np.arange(0, 100))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]


## F.A. Vector Computations

### F.A.1. Vector to Scalar

In [None]:
varArray = np.arange(0, 5)
varArray

In [None]:
varArray * 2

### F.A.2. Vector to Vector: Dot Product

In [None]:
varArrayA = np.arange(0,5)
varArrayB = np.arange(5,10)

In [None]:
print(varArrayA)

In [None]:
print(varArrayB)

In [None]:
print(np.dot(varArrayA, varArrayB))

### F.A.3. Vector to Vector: Element-wise Multiplication

In [None]:
varArrayA * varArrayB

## F.B. Matrix Computations

### F.B.1. Matrix to Scalar: Element-wise Multiplication

In [None]:
mat_a = np.random.randint(0, 5, size=(4,4))

In [None]:
print(mat_a)

In [None]:
print(mat_a * 2)

### F.B.2. Matrix to Matrix: Matrix Multiplication 

In [None]:
mat_a * mat_a

### F.B.3. Advanced Matrix Operations

In [None]:
from scipy.linalg import eig

In [None]:
eig(mat_a)

## G. Data Manipulation Using Pandas

In [25]:
import pandas as pd

In [26]:

data = pd.read_csv("movie_metadata.csv")
data

Unnamed: 0,movie_title,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Avatar,Color,James Cameron,723.0,178,0.0,855.0,Joel David Moore,1000.0,760505847.0,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000.0
1,Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0.0
2,Spectre,Color,Sam Mendes,602.0,148,0.0,161.0,Rory Kinnear,11000.0,200074175.0,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000.0
3,The Dark Knight Rises,Color,Christopher Nolan,813.0,164,22000.0,23000.0,Christian Bale,27000.0,448130642.0,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000.0
4,Star Wars: Episode VII - The Force Awakens ...,,Doug Walker,,,131.0,,Rob Walker,131.0,,...,,,,,,,12.0,7.1,,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5039,The Following,Color,,43.0,43,,319.0,Valorie Curry,841.0,,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000.0
5040,A Plague So Pleasant,Color,Benjamin Roberds,13.0,76,0.0,0.0,Maxwell Moody,0.0,,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16.0
5041,Shanghai Calling,Color,Daniel Hsia,14.0,100,0.0,489.0,Daniel Henney,946.0,10443.0,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660.0
5042,My Date with Drew,Color,Jon Gunn,43.0,90,16.0,16.0,Brian Herzlinger,86.0,85222.0,...,84.0,English,USA,PG,1100.0,2004.0,23.0,6.6,1.85,456.0


In [27]:
#For the Number of Rows and Columns
data.shape


(5044, 28)

In [28]:
#For the Name of Columns
data.columns


Index(['movie_title', 'color', 'director_name', 'num_critic_for_reviews',
       'duration', 'director_facebook_likes', 'actor_3_facebook_likes',
       'actor_2_name', 'actor_1_facebook_likes', 'gross', 'genres',
       'actor_1_name', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [29]:
#For the Statistical Summary
data.describe()

Unnamed: 0,num_critic_for_reviews,director_facebook_likes,actor_3_facebook_likes,actor_1_facebook_likes,gross,num_voted_users,cast_total_facebook_likes,facenumber_in_poster,num_user_for_reviews,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
count,4993.0,4939.0,5020.0,5036.0,4159.0,5043.0,5043.0,5030.0,5022.0,4551.0,4936.0,5030.0,5043.0,4714.0,5043.0
mean,140.194272,686.509212,645.009761,6560.047061,48468410.0,83668.16,9699.063851,1.371173,272.770808,39752620.0,2002.472853,1651.754473,6.442138,2.220403,7525.964505
std,121.601675,2813.328607,1665.041728,15020.75912,68452990.0,138485.3,18163.799124,2.013576,377.982886,206114900.0,12.474414,4042.438863,1.125116,1.385113,19320.44511
min,1.0,0.0,0.0,0.0,162.0,5.0,0.0,0.0,1.0,218.0,1916.0,0.0,1.6,1.18,0.0
25%,50.0,7.0,133.0,614.0,5340988.0,8593.5,1411.0,0.0,65.0,6000000.0,1999.0,281.0,5.8,1.85,0.0
50%,110.0,49.0,371.5,988.0,25517500.0,34359.0,3090.0,1.0,156.0,20000000.0,2005.0,595.0,6.6,2.35,166.0
75%,195.0,194.5,636.0,11000.0,62309440.0,96309.0,13756.5,2.0,326.0,45000000.0,2011.0,918.0,7.2,2.35,3000.0
max,813.0,23000.0,23000.0,640000.0,760505800.0,1689764.0,656730.0,43.0,5060.0,12215500000.0,2016.0,137000.0,9.5,16.0,349000.0


In [30]:
#For the Data Correlations
# Get the Features to predict the outputs

'''
gross = num_voted_users -> 0.637271
movie_facebook_likes = num_critic_for_reviews -> 0.683176
'''
data.corr(numeric_only=True)

Unnamed: 0,num_critic_for_reviews,director_facebook_likes,actor_3_facebook_likes,actor_1_facebook_likes,gross,num_voted_users,cast_total_facebook_likes,facenumber_in_poster,num_user_for_reviews,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
num_critic_for_reviews,1.0,0.180674,0.271646,0.190016,0.480601,0.624943,0.263203,-0.033897,0.609387,0.119994,0.275707,0.282306,0.305303,-0.049786,0.683176
director_facebook_likes,0.180674,1.0,0.120199,0.090723,0.144945,0.297057,0.119549,-0.041268,0.22189,0.02109,-0.06382,0.119601,0.170802,0.001642,0.162048
actor_3_facebook_likes,0.271646,0.120199,1.0,0.249927,0.308026,0.287239,0.47392,0.099368,0.230189,0.047451,0.096137,0.559662,0.052633,-0.003366,0.278844
actor_1_facebook_likes,0.190016,0.090723,0.249927,1.0,0.154468,0.192804,0.951661,0.072257,0.145461,0.022639,0.086873,0.390487,0.076099,-0.020049,0.135348
gross,0.480601,0.144945,0.308026,0.154468,1.0,0.637271,0.2474,-0.027755,0.559958,0.102179,0.030886,0.262768,0.198021,0.069346,0.378082
num_voted_users,0.624943,0.297057,0.287239,0.192804,0.637271,1.0,0.265911,-0.026998,0.798406,0.079621,0.007397,0.27079,0.410965,-0.014761,0.537924
cast_total_facebook_likes,0.263203,0.119549,0.47392,0.951661,0.2474,0.265911,1.0,0.091475,0.206923,0.036557,0.109971,0.628404,0.085787,-0.017885,0.209786
facenumber_in_poster,-0.033897,-0.041268,0.099368,0.072257,-0.027755,-0.026998,0.091475,1.0,-0.069018,-0.019559,0.061504,0.071228,-0.062958,0.013713,0.008918
num_user_for_reviews,0.609387,0.22189,0.230189,0.145461,0.559958,0.798406,0.206923,-0.069018,1.0,0.084292,-0.003147,0.219496,0.292475,-0.024719,0.400594
budget,0.119994,0.02109,0.047451,0.022639,0.102179,0.079621,0.036557,-0.019559,0.084292,1.0,0.045726,0.044236,0.030688,0.006598,0.062039


In [31]:
#To See Columns for Missing Value
data.isnull().sum().sort_values(ascending=False)


gross                        885
budget                       493
aspect_ratio                 330
content_rating               303
plot_keywords                154
title_year                   108
director_facebook_likes      105
director_name                104
num_critic_for_reviews        51
actor_3_facebook_likes        24
actor_3_name                  23
num_user_for_reviews          22
color                         20
duration                      15
language                      15
actor_2_facebook_likes        14
facenumber_in_poster          14
actor_2_name                  13
actor_1_facebook_likes         8
actor_1_name                   7
country                        5
imdb_score                     1
movie_facebook_likes           1
movie_imdb_link                1
cast_total_facebook_likes      1
num_voted_users                1
genres                         0
movie_title                    0
dtype: int64

In [32]:
cleaned_data_filled = data.fillna(value=0)

In [33]:
#To See Columns for Missing Value
cleaned_data_filled


Unnamed: 0,movie_title,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Avatar,Color,James Cameron,723.0,178,0.0,855.0,Joel David Moore,1000.0,760505847.0,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000.0
1,Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0.0
2,Spectre,Color,Sam Mendes,602.0,148,0.0,161.0,Rory Kinnear,11000.0,200074175.0,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000.0
3,The Dark Knight Rises,Color,Christopher Nolan,813.0,164,22000.0,23000.0,Christian Bale,27000.0,448130642.0,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000.0
4,Star Wars: Episode VII - The Force Awakens ...,0,Doug Walker,0.0,,131.0,0.0,Rob Walker,131.0,0.0,...,0.0,0,0,0,0.0,0.0,12.0,7.1,0.00,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5039,The Following,Color,0,43.0,43,0.0,319.0,Valorie Curry,841.0,0.0,...,359.0,English,USA,TV-14,0.0,0.0,593.0,7.5,16.00,32000.0
5040,A Plague So Pleasant,Color,Benjamin Roberds,13.0,76,0.0,0.0,Maxwell Moody,0.0,0.0,...,3.0,English,USA,0,1400.0,2013.0,0.0,6.3,0.00,16.0
5041,Shanghai Calling,Color,Daniel Hsia,14.0,100,0.0,489.0,Daniel Henney,946.0,10443.0,...,9.0,English,USA,PG-13,0.0,2012.0,719.0,6.3,2.35,660.0
5042,My Date with Drew,Color,Jon Gunn,43.0,90,16.0,16.0,Brian Herzlinger,86.0,85222.0,...,84.0,English,USA,PG,1100.0,2004.0,23.0,6.6,1.85,456.0


In [34]:
cleaned_data_filled.isnull().sum().sort_values(ascending=False)


movie_title                  0
color                        0
aspect_ratio                 0
imdb_score                   0
actor_2_facebook_likes       0
title_year                   0
budget                       0
content_rating               0
country                      0
language                     0
num_user_for_reviews         0
movie_imdb_link              0
plot_keywords                0
facenumber_in_poster         0
actor_3_name                 0
cast_total_facebook_likes    0
num_voted_users              0
actor_1_name                 0
genres                       0
gross                        0
actor_1_facebook_likes       0
actor_2_name                 0
actor_3_facebook_likes       0
director_facebook_likes      0
duration                     0
num_critic_for_reviews       0
director_name                0
movie_facebook_likes         0
dtype: int64

In [35]:
#To See Columns for Missing Value
cleaned_data_removed = data.dropna(axis=0, how='any')


In [36]:
#For the Number of Rows and Columns of the Processed Data
cleaned_data_removed.isnull().sum().sort_values(ascending=False)


movie_title                  0
color                        0
aspect_ratio                 0
imdb_score                   0
actor_2_facebook_likes       0
title_year                   0
budget                       0
content_rating               0
country                      0
language                     0
num_user_for_reviews         0
movie_imdb_link              0
plot_keywords                0
facenumber_in_poster         0
actor_3_name                 0
cast_total_facebook_likes    0
num_voted_users              0
actor_1_name                 0
genres                       0
gross                        0
actor_1_facebook_likes       0
actor_2_name                 0
actor_3_facebook_likes       0
director_facebook_likes      0
duration                     0
num_critic_for_reviews       0
director_name                0
movie_facebook_likes         0
dtype: int64

In [37]:
cleaned_data_removed.shape, data.shape

((3755, 28), (5044, 28))

In [38]:
#To have a CSV copy of Cleaned Data
cleaned_data_filled.to_csv("cleaned_movie_data.csv")

### G.1. Slicing Dataframes

In [39]:
A = data[:5]
A

Unnamed: 0,movie_title,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000.0
1,Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0.0
2,Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000.0
3,The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000.0
4,Star Wars: Episode VII - The Force Awakens ...,,Doug Walker,,,131.0,,Rob Walker,131.0,,...,,,,,,,12.0,7.1,,0.0


### G.2. Indexing Columns

In [40]:
B = data.director_name[:5]
B

0        James Cameron
1       Gore Verbinski
2           Sam Mendes
3    Christopher Nolan
4          Doug Walker
Name: director_name, dtype: object

In [41]:
C = data.actor_2_name[:10] 
C

0     Joel David Moore
1        Orlando Bloom
2         Rory Kinnear
3       Christian Bale
4           Rob Walker
5      Samantha Morton
6         James Franco
7         Donna Murphy
8    Robert Downey Jr.
9     Daniel Radcliffe
Name: actor_2_name, dtype: object

In [85]:
#d = data.iloc[:5, [0,2]]
d = data[["movie_title", "director_name"]][:5]
d
# show the movie_title and director_name


Unnamed: 0,movie_title,director_name
0,Avatar,James Cameron
1,Pirates of the Caribbean: At World's End,Gore Verbinski
2,Spectre,Sam Mendes
3,The Dark Knight Rises,Christopher Nolan
4,Star Wars: Episode VII - The Force Awakens ...,Doug Walker


In [None]:
column2 = 
E = 
E

### G.3. Indexing Rows

In [61]:
F = data.iloc[10:12]
F

Unnamed: 0,movie_title,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
10,Batman v Superman: Dawn of Justice,Color,Zack Snyder,673.0,183,0.0,2000.0,Lauren Cohan,15000.0,330249062.0,...,3018.0,English,USA,PG-13,250000000.0,2016.0,4000.0,6.9,2.35,197000.0
11,Superman Returns,Color,Bryan Singer,434.0,169,0.0,903.0,Marlon Brando,18000.0,200069408.0,...,2367.0,English,USA,PG-13,209000000.0,2006.0,10000.0,6.1,2.35,0.0


### G.4. Search for James Cameron Directed Movies

In [56]:
G = data[data.director_name == "James Cameron"]
G

Unnamed: 0,movie_title,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Avatar,Color,James Cameron,723.0,178,0.0,855.0,Joel David Moore,1000.0,760505847.0,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000.0
26,Titanic,Color,James Cameron,315.0,194,0.0,794.0,Kate Winslet,29000.0,658672302.0,...,2528.0,English,USA,PG-13,200000000.0,1997.0,14000.0,7.7,2.35,26000.0
288,Terminator 2: Judgment Day,Color,James Cameron,210.0,153,0.0,539.0,Jenette Goldstein,780.0,204843350.0,...,983.0,English,USA,R,102000000.0,1991.0,604.0,8.5,2.35,13000.0
291,True Lies,Color,James Cameron,94.0,141,0.0,618.0,Tia Carrere,2000.0,146282411.0,...,351.0,English,USA,R,115000000.0,1994.0,1000.0,7.2,2.35,0.0
606,The Abyss,Color,James Cameron,82.0,171,0.0,638.0,Todd Graff,2000.0,54222000.0,...,380.0,English,USA,PG-13,69500000.0,1989.0,650.0,7.6,2.35,0.0
2486,Aliens,Color,James Cameron,250.0,154,0.0,604.0,Carrie Henn,2000.0,85200000.0,...,1076.0,English,USA,R,18500000.0,1986.0,626.0,8.4,1.85,18000.0
3575,The Terminator,Color,James Cameron,204.0,107,0.0,255.0,Brian Thompson,2000.0,38400000.0,...,692.0,English,UK,R,6500000.0,1984.0,663.0,8.1,1.85,13000.0


### G.5. Sort Films by Gross Earnings

In [60]:
sorted_data = data.sort_values(by="gross", ascending=False)
H = sorted_data[:5]
H

Unnamed: 0,movie_title,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Avatar,Color,James Cameron,723.0,178,0.0,855.0,Joel David Moore,1000.0,760505847.0,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000.0
26,Titanic,Color,James Cameron,315.0,194,0.0,794.0,Kate Winslet,29000.0,658672302.0,...,2528.0,English,USA,PG-13,200000000.0,1997.0,14000.0,7.7,2.35,26000.0
29,Jurassic World,Color,Colin Trevorrow,644.0,124,365.0,1000.0,Judy Greer,3000.0,652177271.0,...,1290.0,English,USA,PG-13,150000000.0,2015.0,2000.0,7.0,2.0,150000.0
794,The Avengers,Color,Joss Whedon,703.0,173,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000.0
17,The Avengers,Color,Joss Whedon,703.0,173,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000.0


### G.6. Multiple Conditions: Find Films from Canada with Hugh Jackman as the actor_1_name

In [83]:
I = (data["country"] == "Canada") & (data["actor_1_name"] == "Hugh Jackman")
J = data[I]
column1 = ["movie_title", "country", "actor_1_name"]
K = J[column1][:5]
K


Unnamed: 0,movie_title,country,actor_1_name
34,X-Men: The Last Stand,Canada,Hugh Jackman
210,X-Men 2,Canada,Hugh Jackman


## Challenge! Get the Top 5 Films with Michael Bay's the Movie Director

In [91]:
d = data[data["director_name"] == "Michael Bay"].sort_values(by = "gross", ascending=False).iloc[:5, [0, 2, 9]]
d = data[data["director_name"] == "Michael Bay"].sort_values(by="gross", ascending=False)[["movie_title", "director_name", "gross"]][:5]
d



Unnamed: 0,movie_title,director_name,gross
36,Transformers: Revenge of the Fallen,Michael Bay,402076689.0
53,Transformers: Dark of the Moon,Michael Bay,352358779.0
112,Transformers,Michael Bay,318759914.0
37,Transformers: Age of Extinction,Michael Bay,245428137.0
151,Armageddon,Michael Bay,201573391.0


##### PREPARED BY: 
###### Dr. Robert G. de Luna, PECE

rgdeluna@pup.edu.ph,
robert.deluna.phd@gmail.com