<a href="https://colab.research.google.com/github/TheMaze45/Pandas/blob/main/debugging_challenges.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Debugging Python code

Go through the exercises below. 

- Each exercise contains some code with **one or more mistakes**. 
- The mistakes can either prompt an error or not.
- There might be multiple ways to fix the mistakes.
- Improving the code readability is also encouraged.

In [2]:
# data creation
beatles = ["John Lennon", "Paul McCartney", "George Harrison", "Ringo Starr"]

numbers = [1, 2, 3, 4, 5]

capitals = {"Germany": "Berlin", 
            "Russia": "Moscow",
            "France": "Paris",
            "China": "Beijing",
            "Egypt": "Cairo",
            "Brazil": "Brasilia" # capital of Brazil is not Sao Paulo, == Brasilia
            }

top_profitable_films = {
    "Film": ["Avengers: Endgame", "Avatar", "Titanic", "Star Wars: The Force Awakens", "Jurassic World",
             "The Lion King", "The Avengers", "Frozen II", "Frozen", "Beauty and the Beast"],
    "Year": ["2019", "2007", "1997", "2015", "2015", "2019", "2012", "2019", "2013", "2017"],
    "Worldwide Gross (in billions)": ["2.798", "2.789", "2.194", "2.073", "1.673", "1.656", "1.519", 
                                      "1.450", "1.276", "1.263"]
    }

## Exercise 1:

In [None]:
# first mistake is a spelling mistake in the for loop Capitals with a capital C is not defined, needs to be lower case

for c in Capitals.keys():
  print(f"{c} is the capital of {Capitals[c]}.")

In [None]:
#spelling fixed
# Now the loop works, but the sentence is not fully correct yet, it needs to be "[city] is the capital of [country]."
for c in capitals.keys():
  print(f"{c} is the capital of {capitals[c]}.")

Germany is the capital of Berlin.
Russia is the capital of Moscow.
France is the capital of Paris.
China is the capital of Beijing.
Egypt is the capital of Cairo.
Brazil is the capital of Brazil.


In [3]:
# changed the order, also made the loop more readable by giving it other variable names
for country in capitals:
  print(f"{capitals[country]} is the capital of {country}.")

Berlin is the capital of Germany.
Moscow is the capital of Russia.
Paris is the capital of France.
Beijing is the capital of China.
Cairo is the capital of Egypt.
Brasilia is the capital of Brazil.


## Exercise 2:
Let's imagine we want to show our love for Ringo Starr and print a love statement for him as many times as numbers are in the `numbers` list. For all Beatles who are not Ringo, we want to print as many times a hate statement. The output should look like this:

```
I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!
```



In [None]:
# first mistake which is in plain sight is on line 3, one = means we assign something to a variable, if we want to compare something, we use two ==
for beatle in beatles:
  if beatle = "Ringo Starr":
    for n in numbers:
      print(f"I love {beatle}!")
  if beatle != "Ringo Starr":
    print(f"I hate {beatle}!")
      print("\n")

In [5]:
# Next up is the order of our loops, the first loop should be the n in numbers
for beatle in beatles:
  if beatle == "Ringo Starr":
    for n in numbers:
      print(f"I love {beatle}!")
  if beatle != "Ringo Starr":
    print(f"I hate {beatle}!")
    print("\n")

I hate John Lennon!


I hate Paul McCartney!


I hate George Harrison!


I love Ringo Starr!
I love Ringo Starr!
I love Ringo Starr!
I love Ringo Starr!
I love Ringo Starr!


In [6]:
# Now the looping works correct, but there is still a problem with the paragraphs, they are all over the place,
# there should only be one paragraph after each loop
for n in numbers:
  for beatle in beatles:
    if beatle == "Ringo Starr":
      print(f"I love {beatle}!")
    if beatle != "Ringo Starr":
      print(f"I hate {beatle}!")
      print("\n")

I hate John Lennon!


I hate Paul McCartney!


I hate George Harrison!


I love Ringo Starr!
I hate John Lennon!


I hate Paul McCartney!


I hate George Harrison!


I love Ringo Starr!
I hate John Lennon!


I hate Paul McCartney!


I hate George Harrison!


I love Ringo Starr!
I hate John Lennon!


I hate Paul McCartney!


I hate George Harrison!


I love Ringo Starr!
I hate John Lennon!


I hate Paul McCartney!


I hate George Harrison!


I love Ringo Starr!


In [7]:
# fixed the paragraph problem with a better indentation
# we could also work on a better syntax, instead of an if-statement on line 7 we could simply use an else:
for n in numbers:
  for beatle in beatles:
    if beatle == "Ringo Starr":
      print(f"I love {beatle}!")
    if beatle != "Ringo Starr":
      print(f"I hate {beatle}!")
  print("\n")

I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!




In [8]:
for n in numbers:
  for beatle in beatles:
    if beatle == "Ringo Starr":
      print(f"I love {beatle}!")
    else:
      print(f"I hate {beatle}!")
  print("\n")

I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!




## Exercise 3:

In [None]:
# Two things that stand out, first we should not overwrite an assigned variable, instead we assign the newly formed DataFrame to a new variable
# also to use the Pandas library, we must import it first !
# Also when assigning a DataFrame to a variable, make it clear that this variable contains a DataFrame with a simple "_df" at the end
top_profitable_films = pd.DataFrame(top_profitable_films)
top_profitable_films.head

In [15]:
# Something still looks weird
# When looking up the syntax of the Pandas head, we seem to be missing some brackets ()
import pandas as pd

most_prof_films_df = pd.DataFrame(top_profitable_films)
most_prof_films_df.head

<bound method NDFrame.head of                            Film  Year Worldwide Gross (in billions)
0             Avengers: Endgame  2019                         2.798
1                        Avatar  2007                         2.789
2                       Titanic  1997                         2.194
3  Star Wars: The Force Awakens  2015                         2.073
4                Jurassic World  2015                         1.673
5                 The Lion King  2019                         1.656
6                  The Avengers  2012                         1.519
7                     Frozen II  2019                         1.450
8                        Frozen  2013                         1.276
9          Beauty and the Beast  2017                         1.263>

In [16]:
# That fixed it !
import pandas as pd

most_prof_films_df = pd.DataFrame(top_profitable_films)
most_prof_films_df.head()

Unnamed: 0,Film,Year,Worldwide Gross (in billions)
0,Avengers: Endgame,2019,2.798
1,Avatar,2007,2.789
2,Titanic,1997,2.194
3,Star Wars: The Force Awakens,2015,2.073
4,Jurassic World,2015,1.673


## Exercise 4:

We realised that, in our top_films_df, the year of the movie Avatar is wrong. We want to replace it for the correct one, 2009:

In [17]:
# What we wanna do here, is update the value in on of our rows, we can simply do that with a .loc 
# For that we should first locate the row that we cant to update
most_prof_films_df[most_prof_films_df["Film"]=="Avatar"]["Year"] = "2009"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  most_prof_films_df[most_prof_films_df["Film"]=="Avatar"]["Year"] = "2009"


In [18]:
# We see that the film Avatar is on index position 1
most_prof_films_df

Unnamed: 0,Film,Year,Worldwide Gross (in billions)
0,Avengers: Endgame,2019,2.798
1,Avatar,2007,2.789
2,Titanic,1997,2.194
3,Star Wars: The Force Awakens,2015,2.073
4,Jurassic World,2015,1.673
5,The Lion King,2019,1.656
6,The Avengers,2012,1.519
7,Frozen II,2019,1.45
8,Frozen,2013,1.276
9,Beauty and the Beast,2017,1.263


In [19]:
# Applying the .loc to the DF
most_prof_films_df.loc[1,"Year"] = "2009"

In [20]:
# Checking if everything went well
# Looks good :)
most_prof_films_df

Unnamed: 0,Film,Year,Worldwide Gross (in billions)
0,Avengers: Endgame,2019,2.798
1,Avatar,2009,2.789
2,Titanic,1997,2.194
3,Star Wars: The Force Awakens,2015,2.073
4,Jurassic World,2015,1.673
5,The Lion King,2019,1.656
6,The Avengers,2012,1.519
7,Frozen II,2019,1.45
8,Frozen,2013,1.276
9,Beauty and the Beast,2017,1.263


## Exercise 5:

We want to get the average gross profit of all films:

In [22]:
# What we want is the average or mean of one column
most_prof_films_df["Worldwide Gross (in billions)"].avg()

AttributeError: ignored

In [29]:
# Our problem is, that the .mean function only works on numerics, but our Year column is full of strings
# When can check that
most_prof_films_df['Worldwide Gross (in billions)'].mean()

TypeError: ignored

In [31]:
# When a column contains strings, the print will return "object"
# So we need to convert the column to numbers first !
print(most_prof_films_df['Worldwide Gross (in billions)'].dtype) 

object


In [32]:
# pandas.to_numeric is the right call here

most_prof_films_df["Worldwide Gross (in billions)"] = pd.to_numeric(most_prof_films_df["Worldwide Gross (in billions)"], downcast="float")

In [34]:
# Checking the type of the column again
# We now see, that type changed from object to float32
# Now we can use the .mean() function
print(most_prof_films_df['Worldwide Gross (in billions)'].dtype) 

float32


In [35]:
# Et voilà !
most_prof_films_df['Worldwide Gross (in billions)'].mean()

1.8691002