# Introduction to Data Analytics

Data analytics involves examining raw data to conclude that information. It is used across various industries to make better business decisions and to verify or disprove scientific models and theories.

In this notebook, we'll cover the following:
- A quick review of Python basics: lists and dictionaries.
- Challenges to apply what you've learned.


# Introduction to Lists

A list is a versatile data structure in Python that can hold an ordered collection of items, which can be of different types (like integers, strings, or even other lists).


## Creating a List

To create a list, you simply place your items inside square brackets `[]`, separated by commas.

In [1]:
# Example: Creating a list of numbers
my_list = [10, 20, 30, 40, 50]
print("My List:", my_list)

# Example: Creating a list of mixed data types
mixed_list = [1, "Hello", 3.14, True]
print("Mixed List:", mixed_list)

My List: [10, 20, 30, 40, 50]
Mixed List: [1, 'Hello', 3.14, True]


## List Length

You can find out how many items are in a list by using the `len()` function.

In [2]:
# Example: Finding the length of a list
length_of_my_list = len(my_list)
print("Length of My List:", length_of_my_list)

Length of My List: 5


## List Indexing

Each item in a list has an index that starts at 0 for the first item. You can retrieve an item from the list using its index.

In [3]:
# Example: Retrieving elements using positive indexing
first_item = my_list[0]
third_item = my_list[2]

print("First item:", first_item)
print("Third item:", third_item)

First item: 10
Third item: 30


## Retrieving Values from a List

You can access individual elements in a list by their index.

In [4]:
# Example: Accessing elements
second_item = my_list[1]
print("Second item:", second_item)

Second item: 20


## Negative Indexing

Negative indexing allows you to access elements from the end of the list. The last item has an index of -1, the second to last has an index of -2, and so on.

In [5]:
# Example: Retrieving elements using negative indexing
last_item = my_list[-1]
second_last_item = my_list[-2]

print("Last item:", last_item)
print("Second to last item:", second_last_item)

Last item: 50
Second to last item: 40


## Retrieving Multiple List Elements

You can retrieve multiple elements by using list slicing, which involves specifying a start and stop index.

In [6]:
# Example: Slicing a list to retrieve multiple elements
sub_list = my_list[1:4]  # Retrieves items at index 1, 2, and 3
print("Sub-list:", sub_list)

Sub-list: [20, 30, 40]


## List Slicing

Slicing a list allows you to create a new list from a subset of elements. The syntax is `list[start:stop]`, where `start` is the index to begin at, and `stop` is the index to end before (not included).

In [7]:
# Example: Slicing from the beginning to a specific index
start_slice = my_list[:3]  # First three elements
print("Start slice:", start_slice)

# Example: Slicing from a specific index to the end
end_slice = my_list[2:]  # Elements from index 2 to the end
print("End slice:", end_slice)

# Example: Selection from a list
selection = my_list[2:4]
print("Selection:", selection)

Start slice: [10, 20, 30]
End slice: [30, 40, 50]
Selection: [30, 40]


## List of Lists

A list can contain other lists as elements. This is known as a "list of lists," which allows for more complex data structures.

In [8]:
# Example: Creating a list of lists
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print("List of Lists:", list_of_lists)

List of Lists: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]


## Retrieving from a List of Lists

You can access individual elements within a list of lists by chaining indices together.

In [9]:
# Example: Accessing elements within a list of lists
first_list = list_of_lists[0]  # The first list
first_element_of_first_list = list_of_lists[0][0]  # The first element of the first list

print("First List:", first_list)
print("First element of the first list:", first_element_of_first_list)

First List: [1, 2, 3]
First element of the first list: 1


# Introduction to Dictionaries

A dictionary is a collection of key-value pairs in Python. It is an unordered, mutable, and indexed data structure. Each key in a dictionary must be unique and is used to retrieve the corresponding value.

## Creating a Dictionary

You can create a dictionary by placing key-value pairs inside curly braces `{}`, separated by commas. The keys and values are separated by a colon `:`.

In [10]:
# Example: Creating a dictionary with fruit names and their quantities
fruit_dict = {
    'apples': 10,
    'bananas': 20,
    'oranges': 15,
    'pears': 5
}
print("Fruit Dictionary:", fruit_dict)

Fruit Dictionary: {'apples': 10, 'bananas': 20, 'oranges': 15, 'pears': 5}


## Indexing (Accessing Values by Keys)

You can access the value associated with a key by placing the key inside square brackets `[]`.

In [11]:
# Example: Accessing the quantity of bananas
bananas_count = fruit_dict['bananas']
print("Number of Bananas:", bananas_count)

Number of Bananas: 20


## Key-Value Pairs

Each entry in a dictionary is a key-value pair. You can retrieve all keys, all values, or all key-value pairs using dictionary methods.

In [12]:
# Example: Retrieving all keys and all values
all_keys = fruit_dict.keys()
all_values = fruit_dict.values()

print("Keys:", all_keys)
print("Values:", all_values)

# Example: Retrieving all key-value pairs
all_items = fruit_dict.items()
print("Key-Value Pairs:", all_items)

Keys: dict_keys(['apples', 'bananas', 'oranges', 'pears'])
Values: dict_values([10, 20, 15, 5])
Key-Value Pairs: dict_items([('apples', 10), ('bananas', 20), ('oranges', 15), ('pears', 5)])


## Updating Values

You can update the value associated with a key by using the assignment operator `=`.

In [13]:
# Example: Updating the quantity of apples
fruit_dict['apples'] = 12
print("Updated Fruit Dictionary:", fruit_dict)

Updated Fruit Dictionary: {'apples': 12, 'bananas': 20, 'oranges': 15, 'pears': 5}


## Finding Unique Values

Sometimes, you may want to know all the unique values in a dictionary. You can achieve this by converting the dictionary values to a set, which automatically removes duplicates.

In [14]:
# Example: Finding unique quantities of fruits
unique_values = set(fruit_dict.values())
print("Unique Values:", unique_values)

Unique Values: {5, 12, 20, 15}


## Counting Unique Content

To count how many times each value appears in the dictionary, you can iterate through the values and use another dictionary to keep track of counts.

In [15]:
# Example: Counting the frequency of each quantity
value_counts = {}

for value in fruit_dict.values():
    if value in value_counts:
        value_counts[value] += 1
    else:
        value_counts[value] = 1

print("Value Counts:", value_counts)

Value Counts: {12: 1, 20: 1, 15: 1, 5: 1}


## Proportions and Percentages

You can calculate the proportion and percentage of each value relative to the total sum of all values.

In [16]:
# Example: Calculating proportions and percentages of fruit quantities

# Step 1: Calculate the total number of fruits
total_fruits = sum(fruit_dict.values())
print("Total number of fruits:", total_fruits)

# Step 2: Initialize empty dictionaries for proportions and percentages
proportions = {}
percentages = {}

# Step 3: Calculate the proportion and percentage for each fruit type
for key, value in fruit_dict.items():
    proportion = value / total_fruits  # Calculate proportion
    percentage = (value / total_fruits) * 100  # Calculate percentage
    
    # Store the results in the dictionaries
    proportions[key] = proportion
    percentages[key] = percentage

# Step 4: Print the results
print("Proportions:", proportions)
print("Percentages:", percentages)

Total number of fruits: 52
Proportions: {'apples': 0.23076923076923078, 'bananas': 0.38461538461538464, 'oranges': 0.28846153846153844, 'pears': 0.09615384615384616}
Percentages: {'apples': 23.076923076923077, 'bananas': 38.46153846153847, 'oranges': 28.846153846153843, 'pears': 9.615384615384617}


## Frequency Distribution

You can filter a dictionary based on conditions applied to its values, such as selecting key-value pairs where the value falls within a certain range. We'll break this process down into multiple steps.

In [17]:
# Step 1: Define a list of ages and a dictionary to store age group counts
ages = {
    'Alice': 25,
    'Bob': 42,
    'Charlie': 17,
    'David': 32,
    'Eva': 15,
    'Frank': 70,
    'Grace': 55,
    'Helen': 82,
    'Ian': 29
}

age_groups = {
    '0-18': 0,
    '19-35': 0,
    '36-50': 0,
    '51-70': 0,
    '71+': 0
}

# Step 2: Iterate through the dictionary and categorize each age
for name, age in ages.items():
    if age <= 18:
        age_groups['0-18'] += 1
    elif 19 <= age <= 35:
        age_groups['19-35'] += 1
    elif 36 <= age <= 50:
        age_groups['36-50'] += 1
    elif 51 <= age <= 70:
        age_groups['51-70'] += 1
    elif age > 70:
        age_groups['71+'] += 1

# Step 3: Print the categorized age groups
print("Age Group Counts:", age_groups)

Age Group Counts: {'0-18': 2, '19-35': 3, '36-50': 1, '51-70': 2, '71+': 1}


# Challenges

## Challenge: Practice with Movie Lists

1. Create a list of your top 5 favourite movies.
2. Find out how many movies are on your list.
3. Retrieve the first and last movie in the list.
4. Use slicing to create a sub-list with the second and third movies.
5. Create a list of lists where each list contains details of a movie (title, release year, director, rating).
6. Retrieve the director's name of the second movie in your list.

**Bonus Challenge**:
- Reverse your list of favourite movies.
- Add a new movie to your list and update its details in the list of lists.

In [None]:
#1

In [None]:
#2

In [None]:
#3

In [None]:
#4

In [None]:
#5

In [None]:
#6

In [None]:
# Bonus 1

In [None]:
# Bonus 2

## Challenge: Practice with Dictionaries (Video Game Sales)

Imagine you are analysing the sales data for a series of games released in the past year. You have a dictionary where the keys are the game titles and the values are the number of units sold. Your tasks are as follows:

1. **Create a dictionary** where the keys are game titles and the values are the number of units sold. Here’s an example dataset:

    - Hogwarts Legacy - 1,500,000 units
    - Resident Evil 4 Remake - 1,200,000 units
    - Final Fantasy XVI - 1,200,000 units
    - Spider-Man: Miles Morales - 2,000,000 units
    - Forspoken - 300,000 units
    - The Callisto Protocol - 600,000 units
    - Dead Space Remake - 850,000 units
    - Gran Turismo 7 - 1,500,000 units
    - God of War Ragnarök - 3,000,000 units
    - Elden Ring - 3,000,000 units
      
2. **Update the sales figure** for one of the games after a recent promotion or market change.
3. **Identify all the unique sales figures** recorded in the dictionary.
4. **Count how many times each sales figure appears**.
5. **Calculate the proportion and percentage** of each sales figure relative to the total number of units sold.
6. **Create a frequency distribution** of the sales data by categorising the games into the following sales ranges: `0-500,000`, `500,001-1,000,000`, `1,000,001-2,000,000`, and `2,000,001+` units. Count how many games fall into each range.

In [None]:
#1

In [None]:
#2

In [None]:
#3

In [None]:
#4

In [None]:
#5

In [None]:
#6