## Intro to Python

### The Basics

#### Printing

To print something in your output, we simply use the `print` function:

In [None]:
print("Hello world")

#### Numerical operations

At it's most basic level, Python is a calculator. As such, you can perform all the usual operations you might expect:

In [None]:
# Addition
8 + 5

In [None]:
# Subtraction
5 - 3

In [None]:
# Multiplication
6*7

In [None]:
# Division
16/4

In [None]:
# Exponentiation
9**2

#### Variables

In programming, we can define variables, which are essentially just named pieces of data. We can define a new variable (or overwrite the value of a pre-existing variable) by using the _assignment_ operator, which in Python, is just the `=` sign:

In [None]:
my_name = "Dan"
my_age = 35

Once I have assigned a value to a variable, that variable will continue to have that value until I change it. We can print a variable to determine its value:

In [None]:
print(my_name)

Or perform operations on them:

In [None]:
first_number = 42
second_number = 15

In [None]:
first_number + second_number

In [None]:
first_number*second_number

#### Data types

All variables have a data _type_, which depends on the type of data they hold. The fundamental datatypes for the core Python programming language are:

- Integers
- Floats (floating point numbers, a.k.a. decimal numbers)
- Strings (sequences of characters)
- Booleans (True/False values)

In [None]:
int_var = 5
type(int_var)

In [None]:
float_var = 3.1415
type(float_var)

In [None]:
string_var = 'I love Python'
type(string_var)

In [None]:
boolean_var = False
type(boolean_var)

#### Conditions

We can also check for the truth value of conditions by using conditional operators. The output will be a boolean (True/False) value:

In [None]:
# Less than or greater than
print(5 < 4)
print(2 > 1)

In [None]:
# Checking if two values are equal - note the double equals sign
print('Apples' == 'Oranges')
print('Apples' == 'Apples')

#### Lists

There are many more datatypes than the fundamental ones outlined above, but they are beyond the scope of this session. However, one other useful datatype to know about is the **list** type. Lists are _compound_ datatypes that can be collections of other data types. We define them using square brackets:

In [None]:
# A list of strings
my_list = ["Apples", "Oranges", "Bananas"]
print(my_list)

The above list is a list of three string values, but we could have a list of numbers too:

In [None]:
# A list of numbers
my_list = [1, -15, 2.718]
print(my_list)

Or even a list of different datatypes:

In [None]:
my_list = [7, "Instagram", True, 1.618]

In Python, compound datatypes can be indexed using square brackets:

In [None]:
my_list[1]

### Using Python Packages

Provided we have a package installed on our computer, we can use it in our Python code by _importing_ it. To do this, we use the `import` command. In the cell below, we import the **Pandas** package

In [None]:
import pandas

After executing this line, the Pandas package is now imported into our coding environment, and we have access to all of the data structures and functions contained therein. To access a function from the Pandas package, we then need to reference Pandas, so the Python interpreter program knows where to look for it. In the cell below, we create an instance of a tabular data structure which is defined in the Pandas package - the DataFrame:

In [None]:
df = pandas.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
df

It can be a bit annoying to have to type out the full package name every time we want to use one of its functions, so instead, we normally give the package an alias when we import it:

In [None]:
import pandas as pd

Now we only need to use the name `pd` whenever referencing a Pandas function:

In [None]:
df = pd.DataFrame({'A':[1, 2, 3], 'B': ['a', 'b', 'c']})
df

### The Pandas Package

The Pandas (short for **Pan**el **Da**ta) package is one of the most commonly used packages in Python work at Oliver Wyman. It is a fundamental package for working with tabular data and is able to handle much larger datasets than Excel, while at the same time being quicker. 

#### Importing Data

When performing some data analysis, the first thing you'll need to do is read it from some source into Python. The data could be

- in a file on your computer
- in a database
- hosted on a cloud service

Pandas has many functions to enable reading in your data from where it is. For us, our data is hosted in a CSV file on a cloud service called GitHub. To read in our data we can therefore use the `read_csv` function from Pandas:

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/danukenOW/wmp-intro-to-python/main/song_data.csv") 

#### Viewing the data

We can view the top 5 rows of our dataset by using the `head` method. A method is a special kind of function which works on some data structure. To access it, we use a dot notation:

In [None]:
df.head()

We could also use the `tail` method instead if we wanted to view the bottom 5 rows. This dot notation is used whenever we want to perform some operation or calculation on our dataset.

#### Filtering/selecting data

To select a single column of our dataset, we use square brackets, like we did with lists. In the square brackets, we must provide the name of the column:

In [None]:
# Select a row
df['title']

To select rows of our dataset, we use the `loc` (location) operator:

In [None]:
df.loc[0]

We can also use the loc operator when we want to filter our dataset based on some condition. For example, if we wanted to select all the rows where the `album_year` is 1994 we would do the following:

In [None]:
df.loc[df['album_year']==1994]

#### Summary statistics

We can also get summary statistics for our data, using the appropriate method:

In [None]:
print(df['popularity'].mean())
print(df['duration_seconds'].max())
print(df['album_year'].min())

Or view how many rows correspond to a specific value in the dataset:

In [None]:
df['key'].value_counts()

#### Creating a new dataframe

If you want to create a new dataframe which is some filtered view of the original, you can always do so by saving the view to a new variable:

In [None]:
mariah_carey_songs = df.loc[df['artist']=='Mariah Carey']
mariah_carey_songs.head()

### Exercise

You have been approached by a popular music streaming service and asked to get some insights about the most popular songs on their platform.

Using what you know about Pandas, and the dataset above on the Daily Top 50 List from 2023, answer the following questions:

1. How many songs are older than 5 years old?
2. What is the average duration of a song in the Top 50?
3. Are there more explicit or non-explicit songs in the Top 50?
4. Do Top 50 songs tend to be more dancy? Or more acoustic?
5. Which artist(s) has had the most number 1 songs?

**Extra challenge:**

Do songs with higher speechiness (> 0.5) tend to be more explicit than lower speechiness songs?

In [None]:
# Question 1


In [None]:
# Question 2


In [None]:
# Question 3


In [None]:
# Question 4


In [None]:
# Question 5
