# Methods for handling missing values
**pandas provides the following methods to handle missing values:**

- `isna`: Returns a Series of booleans based on whether each value is missing or not.
- `notna`: Exact opposite of isna.
- `fillna`: Fills missing values in a variety of ways
- `dropna`: Drops the missing values from the Series

In [1]:
import pandas as pd

In [2]:
# let us wread movie dataset
movie = pd.read_csv("data/movie.csv")

In [4]:
#show first five rows in movie :
movie.head(5)

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
0,Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
1,Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
2,Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
3,The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,...,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
4,Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,...,,,Documentary,,8,,,,,7.1


In [5]:
# to check if any cell has missing value in the movie dataset:
movie.isna()

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,True,True,True,True,False,False,False,False,False,...,True,True,False,True,False,True,True,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,False,False,False,True,False,False,False,False,False,False,...,False,True,False,False,False,False,False,False,True,False
4912,False,True,False,False,False,True,True,False,False,False,...,False,True,False,False,False,False,False,False,True,False
4913,False,False,False,True,False,False,False,False,False,False,...,False,True,False,False,False,True,False,False,False,False
4914,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,True,False


In [6]:
# we can use the sum function it will sum each column in the df and scince it is boolean it will return number of missing values
movie.isna().sum()

title                0
year               106
color               19
content_rating     300
duration            15
director_name      102
director_fb        102
actor1               7
actor1_fb            7
actor2              13
actor2_fb           13
actor3              23
actor3_fb           23
gross              862
genres               0
num_reviews         49
num_voted_users      0
plot_keywords      152
language            14
country              5
budget             484
imdb_score           0
dtype: int64

In [7]:
#number of missing values in year column
movie["year"].isna().sum()


np.int64(106)

In [10]:
#get all rows in which the year column is missing
filter_1 = movie["year"].isna()
missing_year = movie[filter_1]
missing_year

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
4,Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,...,,,Documentary,,8,,,,,7.1
176,Miami Vice,,Color,TV-14,60.0,,,Don Johnson,982.0,Philip Michael Thomas,...,184.0,,Action|Crime|Drama|Mystery|Thriller,21.0,16769,cult tv|detective|drugs|police|undercover,English,USA,1500000.0,7.5
257,The A-Team,,Color,TV-PG,60.0,,,George Peppard,669.0,Dirk Benedict,...,432.0,,Action|Adventure|Crime,29.0,25402,1980s|cult tv|famous opening theme|good versus...,English,USA,,7.6
276,"10,000 B.C.",,,,22.0,Christopher Barnard,0.0,Mathew Buck,5.0,,...,,,Comedy,,6,,,,,7.2
398,Hannibal,,Color,TV-14,44.0,,,Caroline Dhavernas,544.0,Scott Thompson,...,148.0,,Crime|Drama|Horror|Mystery|Thriller,103.0,159910,blood|cannibalism|fbi|manipulation|psychiatrist,English,USA,,8.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4683,Heroes,,Color,TV-14,60.0,,,Sendhil Ramamurthy,1000.0,Masi Oka,...,833.0,,Drama|Fantasy|Sci-Fi|Thriller,75.0,202115,father daughter relationship|serial killer|sup...,English,USA,,7.7
4688,Home Movies,,Color,TV-PG,22.0,,,Brendon Small,59.0,Ron Lynch,...,6.0,,Animation|Comedy|Drama,11.0,7458,coach|friend|school|series|tv series,English,USA,,8.2
4704,Revolution,,Color,TV-14,43.0,,,Billy Burke,2000.0,Tracy Spiridakos,...,576.0,,Action|Adventure|Drama|Sci-Fi,23.0,72017,2020s|near future|one word series title|post a...,English,USA,,6.7
4752,Happy Valley,,Color,TV-MA,58.0,,,Shirley Henderson,887.0,James Norton,...,250.0,,Crime|Drama,11.0,12848,caravan|police|police sergeant|tied to a chair...,English,UK,,8.5


In [11]:
missing_year = movie[movie["year"].isna()]
missing_year

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
4,Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,...,,,Documentary,,8,,,,,7.1
176,Miami Vice,,Color,TV-14,60.0,,,Don Johnson,982.0,Philip Michael Thomas,...,184.0,,Action|Crime|Drama|Mystery|Thriller,21.0,16769,cult tv|detective|drugs|police|undercover,English,USA,1500000.0,7.5
257,The A-Team,,Color,TV-PG,60.0,,,George Peppard,669.0,Dirk Benedict,...,432.0,,Action|Adventure|Crime,29.0,25402,1980s|cult tv|famous opening theme|good versus...,English,USA,,7.6
276,"10,000 B.C.",,,,22.0,Christopher Barnard,0.0,Mathew Buck,5.0,,...,,,Comedy,,6,,,,,7.2
398,Hannibal,,Color,TV-14,44.0,,,Caroline Dhavernas,544.0,Scott Thompson,...,148.0,,Crime|Drama|Horror|Mystery|Thriller,103.0,159910,blood|cannibalism|fbi|manipulation|psychiatrist,English,USA,,8.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4683,Heroes,,Color,TV-14,60.0,,,Sendhil Ramamurthy,1000.0,Masi Oka,...,833.0,,Drama|Fantasy|Sci-Fi|Thriller,75.0,202115,father daughter relationship|serial killer|sup...,English,USA,,7.7
4688,Home Movies,,Color,TV-PG,22.0,,,Brendon Small,59.0,Ron Lynch,...,6.0,,Animation|Comedy|Drama,11.0,7458,coach|friend|school|series|tv series,English,USA,,8.2
4704,Revolution,,Color,TV-14,43.0,,,Billy Burke,2000.0,Tracy Spiridakos,...,576.0,,Action|Adventure|Drama|Sci-Fi,23.0,72017,2020s|near future|one word series title|post a...,English,USA,,6.7
4752,Happy Valley,,Color,TV-MA,58.0,,,Shirley Henderson,887.0,James Norton,...,250.0,,Crime|Drama,11.0,12848,caravan|police|police sergeant|tied to a chair...,English,UK,,8.5


In [12]:
# count method wil count each non missing value in the column, here it the same as notna().sum()
print(movie["year"].count())
print(movie["year"].notna().sum())

4810
4810


In [18]:
print(movie["year"].shape)
print(movie["year"].count())
print(movie["year"].notna().sum())
print(movie["year"].isna().sum())

(4916,)
4810
4810
106


# use fillna to fill missing values
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html

In [19]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4916 non-null   object 
 1   year             4810 non-null   float64
 2   color            4897 non-null   object 
 3   content_rating   4616 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4814 non-null   object 
 6   director_fb      4814 non-null   float64
 7   actor1           4909 non-null   object 
 8   actor1_fb        4909 non-null   float64
 9   actor2           4903 non-null   object 
 10  actor2_fb        4903 non-null   float64
 11  actor3           4893 non-null   object 
 12  actor3_fb        4893 non-null   float64
 13  gross            4054 non-null   float64
 14  genres           4916 non-null   object 
 15  num_reviews      4867 non-null   float64
 16  num_voted_users  4916 non-null   int64  
 17  plot_keywords 

In [20]:
# fill any missing year with 2024, 
movie["year"].fillna(2024)

0       2009.0
1       2007.0
2       2015.0
3       2012.0
4       2024.0
         ...  
4911    2013.0
4912    2024.0
4913    2013.0
4914    2012.0
4915    2004.0
Name: year, Length: 4916, dtype: float64

movie.info()

In [21]:
movie["year"].fillna(2024,inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  movie["year"].fillna(2024,inplace=True)


In [22]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4916 non-null   object 
 1   year             4916 non-null   float64
 2   color            4897 non-null   object 
 3   content_rating   4616 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4814 non-null   object 
 6   director_fb      4814 non-null   float64
 7   actor1           4909 non-null   object 
 8   actor1_fb        4909 non-null   float64
 9   actor2           4903 non-null   object 
 10  actor2_fb        4903 non-null   float64
 11  actor3           4893 non-null   object 
 12  actor3_fb        4893 non-null   float64
 13  gross            4054 non-null   float64
 14  genres           4916 non-null   object 
 15  num_reviews      4867 non-null   float64
 16  num_voted_users  4916 non-null   int64  
 17  plot_keywords 

In [41]:
complete_year = [movie["year"]==2024]
count_2024 = movie['year'].value_counts().get(2024, 0)
count_2024

np.int64(106)

In [43]:
#use dropna to drop missing values.
#drop any row in which year is missing
movie = movie.dropna(subset=["duration"])
movie.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4901 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4901 non-null   object 
 1   year             4901 non-null   float64
 2   color            4883 non-null   object 
 3   content_rating   4613 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4801 non-null   object 
 6   director_fb      4801 non-null   float64
 7   actor1           4894 non-null   object 
 8   actor1_fb        4894 non-null   float64
 9   actor2           4888 non-null   object 
 10  actor2_fb        4888 non-null   float64
 11  actor3           4880 non-null   object 
 12  actor3_fb        4880 non-null   float64
 13  gross            4052 non-null   float64
 14  genres           4901 non-null   object 
 15  num_reviews      4856 non-null   float64
 16  num_voted_users  4901 non-null   int64  
 17  plot_keywords    47

### Exrcises:

In [44]:
# filter rows in which color is missing
movie['color'].isna().sum()

np.int64(18)

In [47]:
#drop rows in which color is missing --new variable
drop_missing_color = movie.dropna(subset=['color'])
movie = drop_missing_color

In [48]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4883 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4883 non-null   object 
 1   year             4883 non-null   float64
 2   color            4883 non-null   object 
 3   content_rating   4600 non-null   object 
 4   duration         4883 non-null   float64
 5   director_name    4785 non-null   object 
 6   director_fb      4785 non-null   float64
 7   actor1           4876 non-null   object 
 8   actor1_fb        4876 non-null   float64
 9   actor2           4871 non-null   object 
 10  actor2_fb        4871 non-null   float64
 11  actor3           4863 non-null   object 
 12  actor3_fb        4863 non-null   float64
 13  gross            4050 non-null   float64
 14  genres           4883 non-null   object 
 15  num_reviews      4840 non-null   float64
 16  num_voted_users  4883 non-null   int64  
 17  plot_keywords    47

In [53]:
#fill rows in which color is missing with "Color"
movie['color'].fillna('color')
count_fill = movie['color'].value_counts().get('color',0)
count_fill
# it equle 0 becuse i drop color misiing rows later 

0

# Sorting:

The `sort_values` method sorts the Series `from least to greatest by default`. 

It places `missing values at the end`.

In [54]:
df = pd.DataFrame({
'col1': ['A', 'A', 'B', None, 'D', 'C'],
  'col2': [2, 1, 9, 8, 7, 4],
  'col3': [0, 1, 9, 4, 2, 3],
  'col4': ['a', 'B', 'c', 'D', 'e', 'F']
})
df 

Unnamed: 0,col1,col2,col3,col4
0,A,2,0,a
1,A,1,1,B
2,B,9,9,c
3,,8,4,D
4,D,7,2,e
5,C,4,3,F


In [55]:
df.sort_values(by=['col1'])

Unnamed: 0,col1,col2,col3,col4
0,A,2,0,a
1,A,1,1,B
2,B,9,9,c
5,C,4,3,F
4,D,7,2,e
3,,8,4,D


In [56]:
#Sort by multiple columns
df.sort_values(by=['col1', 'col2'])

Unnamed: 0,col1,col2,col3,col4
1,A,1,1,B
0,A,2,0,a
2,B,9,9,c
5,C,4,3,F
4,D,7,2,e
3,,8,4,D


In [57]:
#Sort Descending
df.sort_values(by='col1', ascending=False)

Unnamed: 0,col1,col2,col3,col4
4,D,7,2,e
5,C,4,3,F
2,B,9,9,c
0,A,2,0,a
1,A,1,1,B
3,,8,4,D


In [58]:
#Putting NAs first
df.sort_values(by='col1', ascending=False, na_position='first')

Unnamed: 0,col1,col2,col3,col4
3,,8,4,D
4,D,7,2,e
5,C,4,3,F
2,B,9,9,c
0,A,2,0,a
1,A,1,1,B


In [85]:
movieDf = pd.read_csv("data/movie.csv")

In [86]:
movieDf.head()

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
0,Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
1,Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
2,Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
3,The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,...,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
4,Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,...,,,Documentary,,8,,,,,7.1


#pandas sort_values with examples:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

In [87]:
# ascending=False if we want desc order
movieDf.sort_values(by="year",ascending=False)

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
2211,Nerve,2016.0,Color,PG-13,96.0,Henry Joost,24.0,Samira Wiley,646.0,Marc John Jefferies,...,374.0,28876924.0,Adventure|Crime|Mystery|Sci-Fi|Thriller,86.0,4303,dare|game|knocked out|motorcycle|online game,English,USA,20000000.0,7.1
2083,"Hail, Caesar!",2016.0,Color,PG-13,106.0,Ethan Coen,1000.0,Scarlett Johansson,19000.0,Channing Tatum,...,1000.0,29997095.0,Comedy|Mystery,423.0,60926,50s|film within a film|hollywood|illegitimate ...,English,UK,22000000.0,6.4
73,Suicide Squad,2016.0,Color,PG-13,123.0,David Ayer,452.0,Will Smith,10000.0,Robin Atkin Downes,...,329.0,161087183.0,Action|Adventure|Comedy|Sci-Fi,418.0,118992,based on comic book|critically bashed|father d...,English,USA,175000000.0,6.9
4355,The Dog Lover,2016.0,Color,PG,101.0,Alex Ranarivelo,20.0,Lea Thompson,1000.0,Christina Moore,...,256.0,,Drama,9.0,162,,English,USA,2000000.0,4.8
2077,Our Kind of Traitor,2016.0,Color,R,108.0,Susanna White,24.0,Radivoje Bukvic,150.0,Pawel Szajda,...,100.0,3108216.0,Thriller,134.0,2587,based on novel|male frontal nudity|male nudity...,English,UK,,6.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4683,Heroes,,Color,TV-14,60.0,,,Sendhil Ramamurthy,1000.0,Masi Oka,...,833.0,,Drama|Fantasy|Sci-Fi|Thriller,75.0,202115,father daughter relationship|serial killer|sup...,English,USA,,7.7
4688,Home Movies,,Color,TV-PG,22.0,,,Brendon Small,59.0,Ron Lynch,...,6.0,,Animation|Comedy|Drama,11.0,7458,coach|friend|school|series|tv series,English,USA,,8.2
4704,Revolution,,Color,TV-14,43.0,,,Billy Burke,2000.0,Tracy Spiridakos,...,576.0,,Action|Adventure|Drama|Sci-Fi,23.0,72017,2020s|near future|one word series title|post a...,English,USA,,6.7
4752,Happy Valley,,Color,TV-MA,58.0,,,Shirley Henderson,887.0,James Norton,...,250.0,,Crime|Drama,11.0,12848,caravan|police|police sergeant|tied to a chair...,English,UK,,8.5


In [88]:
# ascending=False if we want desc order
movieDf.sort_values(by="title",na_position='first')

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
4349,#Horror,2015.0,Color,Not Rated,101.0,Tara Subkoff,37.0,Timothy Hutton,501.0,Balthazar Getty,...,56.0,,Drama|Horror|Mystery|Thriller,35.0,1547,bullying|cyberbullying|girl|internet|throat sl...,English,USA,1500000.0,3.3
3629,10 Cloverfield Lane,2016.0,Color,PG-13,104.0,Dan Trachtenberg,16.0,Bradley Cooper,14000.0,John Gallagher Jr.,...,82.0,71897215.0,Drama|Horror|Mystery|Sci-Fi|Thriller,411.0,126893,alien|bunker|car crash|kidnapping|minimal cast,English,USA,15000000.0,7.3
2964,10 Days in a Madhouse,2015.0,Color,R,111.0,Timothy Hines,0.0,Christopher Lambert,1000.0,Kelly LeBrock,...,247.0,14616.0,Drama,1.0,314,,English,USA,12000000.0,7.5
2799,10 Things I Hate About You,1999.0,Color,PG-13,97.0,Gil Junger,19.0,Joseph Gordon-Levitt,23000.0,Heath Ledger,...,835.0,38176108.0,Comedy|Drama|Romance,133.0,222099,dating|protective father|school|shrew|teen movie,English,USA,16000000.0,7.2
276,"10,000 B.C.",,,,22.0,Christopher Barnard,0.0,Mathew Buck,5.0,,...,,,Comedy,,6,,,,,7.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3597,[Rec] 2,2009.0,Color,R,85.0,Jaume Balagueró,57.0,Jonathan D. Mellor,37.0,Pablo Rosso,...,6.0,27024.0,Horror,222.0,55597,apartment|apartment building|blood sample|cruc...,Spanish,Spain,5600000.0,6.6
2127,eXistenZ,1999.0,Color,R,115.0,David Cronenberg,0.0,Jennifer Jason Leigh,1000.0,Sarah Polley,...,716.0,2840417.0,Horror|Sci-Fi|Thriller,196.0,77493,assassin|game|game designer|pod|virtual reality,English,Canada,31000000.0,6.8
579,xXx,2002.0,Color,PG-13,132.0,Rob Cohen,357.0,Vin Diesel,14000.0,Eve,...,212.0,141204016.0,Action|Adventure|Thriller,191.0,142569,agent|nsa|nsa agent|prague|russian,English,USA,70000000.0,5.8
782,xXx: State of the Union,2005.0,Color,PG-13,101.0,Lee Tamahori,93.0,Sunny Mabrey,287.0,Nona Gaye,...,218.0,26082914.0,Action|Adventure|Crime|Thriller,77.0,51349,coup d'etat|mutiny|president|u.s. navy|washing...,English,USA,87000000.0,4.3


## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of actor 1 Facebook likes are missing? ,>>Ahmed answer : done </span>


### Exercise 2
<span  style="color:green; font-size:16px">Use the notna method to find the number of non-missing values in the actor 1 Facebook like column. Verify this
number is the same as the count method. >>Ahmed answer : yes it is the same

### Exercise 3
<span  style="color:green; font-size:16px">Use one line of code to fill the missing values of actor1_fb with the maximum of actor2_fb. Save this result to
variable actor1_fb_full  >>Ahmed answer : done </span>

### Exercise 4
<span  style="color:green; font-size:16px">Verify the results of problem 3 by selecting just the values of actor1_fb_full that were filled by actor2_fb. >>Ahmed answer : done </span>


In [89]:
movieDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4916 non-null   object 
 1   year             4810 non-null   float64
 2   color            4897 non-null   object 
 3   content_rating   4616 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4814 non-null   object 
 6   director_fb      4814 non-null   float64
 7   actor1           4909 non-null   object 
 8   actor1_fb        4909 non-null   float64
 9   actor2           4903 non-null   object 
 10  actor2_fb        4903 non-null   float64
 11  actor3           4893 non-null   object 
 12  actor3_fb        4893 non-null   float64
 13  gross            4054 non-null   float64
 14  genres           4916 non-null   object 
 15  num_reviews      4867 non-null   float64
 16  num_voted_users  4916 non-null   int64  
 17  plot_keywords 

In [90]:
actor1_count = movieDf['actor1_fb'].count()
actor1_count

np.int64(4909)

In [91]:
# What percentage of actor 1 Facebook likes are missing?
actor1_count = movieDf['actor1_fb'].notna().sum()
actor1_count

np.int64(4909)

In [100]:
total =movieDf.shape[0]

In [92]:
actor1_missing = movieDf['actor1_fb'].isna().sum()
actor1_missing

np.int64(7)

In [101]:
percentage_actor1_fb_missing = actor1_missing / total
percentage_actor1_fb_missing

# I divided the number of missing rows by the total number of rows.

np.float64(0.0014239218877135883)

In [102]:
# Exercise 3
# Use one line of code to fill the missing values of actor1_fb with the maximum of actor2_fb. Save this result to
# variable actor1_fb_full

movieDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   title            4916 non-null   object 
 1   year             4810 non-null   float64
 2   color            4897 non-null   object 
 3   content_rating   4616 non-null   object 
 4   duration         4901 non-null   float64
 5   director_name    4814 non-null   object 
 6   director_fb      4814 non-null   float64
 7   actor1           4909 non-null   object 
 8   actor1_fb        4909 non-null   float64
 9   actor2           4903 non-null   object 
 10  actor2_fb        4903 non-null   float64
 11  actor3           4893 non-null   object 
 12  actor3_fb        4893 non-null   float64
 13  gross            4054 non-null   float64
 14  genres           4916 non-null   object 
 15  num_reviews      4867 non-null   float64
 16  num_voted_users  4916 non-null   int64  
 17  plot_keywords 

In [95]:
actor1_fb_full = movieDf['actor1_fb'].fillna(movieDf.loc[:,'actor2_fb'].max())
actor1_fb_full.info()

<class 'pandas.core.series.Series'>
RangeIndex: 4916 entries, 0 to 4915
Series name: actor1_fb
Non-Null Count  Dtype  
--------------  -----  
4916 non-null   float64
dtypes: float64(1)
memory usage: 38.5 KB


In [124]:
# Verify the results of problem 3 by selecting just the values of actor1_fb_full that were filled by actor2_fb
verified_values = actor1_fb_full[movieDf['actor1_fb'].isna()]
verified_values

4401    137000.0
4418    137000.0
4608    137000.0
4721    137000.0
4822    137000.0
4823    137000.0
4864    137000.0
Name: actor1_fb, dtype: float64

# Uniqueness

**There are a few methods that deal with unique values in a Series:**

- `unique`: Returns a numpy array of all the unique values in order of their appearance
- `nunique`: Returns the number of unique values in the Series
- `drop_duplicates`: Returns a pandas Series of just the unique values
- `duplicated` : check if there is any dup;icate value

In [110]:
movieDf["year"].unique()

array([2009., 2007., 2015., 2012.,   nan, 2010., 2016., 2006., 2008.,
       2013., 2011., 2014., 2005., 1997., 2004., 1999., 1995., 2003.,
       2001., 2002., 1998., 2000., 1990., 1991., 1994., 1996., 1982.,
       1993., 1979., 1992., 1989., 1984., 1988., 1978., 1962., 1980.,
       1972., 1981., 1968., 1985., 1940., 1963., 1987., 1986., 1973.,
       1983., 1976., 1977., 1970., 1971., 1969., 1960., 1965., 1964.,
       1927., 1974., 1937., 1975., 1967., 1951., 1961., 1946., 1953.,
       1954., 1959., 1932., 1947., 1956., 1945., 1952., 1930., 1966.,
       1939., 1950., 1948., 1958., 1957., 1943., 1944., 1938., 1949.,
       1936., 1941., 1955., 1942., 1929., 1935., 1933., 1916., 1934.,
       1925., 1920.])

In [106]:
movieDf["year"].nunique()

91

In [112]:
# count the number of unique values 
num = movieDf["year"].nunique()
num

91

In [113]:
df= pd.DataFrame({"id":[1,2,3,4,4],"name":["Ahmed","Ahmed","Mohamed","sara","sara"]})
df

Unnamed: 0,id,name
0,1,Ahmed
1,2,Ahmed
2,3,Mohamed
3,4,sara
4,4,sara


In [114]:
#check if a row is dulpicate
df.duplicated().sum()

np.int64(1)

In [115]:
df.duplicated()

0    False
1    False
2    False
3    False
4     True
dtype: bool

In [117]:
#drop entire duplicate row
df.drop_duplicates()

Unnamed: 0,id,name
0,1,Ahmed
1,2,Ahmed
2,3,Mohamed
3,4,sara


In [118]:
#check if name has duplicates
df.duplicated(subset=["name"])

0    False
1     True
2    False
3    False
4     True
dtype: bool

In [119]:
df.drop_duplicates(subset="name")

Unnamed: 0,id,name
0,1,Ahmed
2,3,Mohamed
3,4,sara


In [None]:
# thanks for this helpful tasks ms.Hala