<a href="https://colab.research.google.com/github/NatFT/PythonCourse/blob/main/Ex_Working_with_Data_Lists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lists

Often we need to store a number of single items of data together so that they can be processed together. This might be because all the data refers to one person (e.g. name, age, gender, etc) OR it might be because we have a set of data (e.g. all the items that should be displayed in a drop down list, such as all the years from this year back to 100 years ago so that someone can select their year of birth)

Python has a range of data structures available including:
*   lists  
*   tuples  
*   dictionaries  
*   sets

This worksheet looks at lists.

## List
A list is a set of related, individual data objects, that are indexed and can be processed as a whole, as subsets or as individual items.  Lists are stored, essentially, as contiguous items in memory so that access can be as quick as possible.  However, they are mutable (they can be changed after they have been created and stored) and so they need to have extra functionality to deal with changing list sizes.

# Let's get some lists of data
For this worksheet we are going to work with data on STEAM games.  We are going to get the data from a spreadsheet and make lists that we can find things out from.



# Creating a list
```
nums = [1, 2, 3, 4, 5]
names = ["Tom","Jerry","Spike"]
```

# Printing a list

```
print(nums)
[1,2,3,4,5]
```

```
print(names)
["Tom","Jerry","Spike"]
```

In [None]:
# create the lists, and print them
nums = [1,2,3,4,5]
names = ["Tom", "Jerry", "Spike"]
print(nums)
print(names)

[1, 2, 3, 4, 5]
['Tom', 'Jerry', 'Spike']


# Print individual items in the list

We can access any item in a list by its position (index).  Lists are indexed from 0.

To print the first item in a list, use listname[0], to print the last item use listname[-1].

```
# This is formatted as code
```



```
print(nums[0])
1
```
```
print(names[1])
Jerry
```

```
print(nums[-1])
5
```


In [None]:
# have a go at printing different items from the lists
print(nums[0])
print(names[1])
print(nums[-1])

1
Jerry
5


# Print a subset of a list

listname[start_index : end_index+1]  
If start_index is the first item, or end_index is the end of the list, they can be left out

```
print(nums[:3])
[1,2,3]
```

```
print(names[1:])
["Jerry","Spike"]
```

```
print(nums[1:3])
[2,3]
```

In [None]:
# have a go at printing subsets of the lists
print(nums[:3])
print(names[1:])
print(nums[1:3])

[1, 2, 3]
['Jerry', 'Spike']
[2, 3]


# List length

Use the len() function to get the number of items in a list.

There are 5 items in the nums list and 3 in the names list.

Write a function that will:
* print the length of the nums list
* print the length of the names list
* concatenate (add) the two lists together to make a new list called num_names
* print the length of the new list

Expected output:
```
The length of the nums list is: 5
The length of the names list is: 3
The length of the joined list is: 8
```


In [None]:
def print_list_info():
  print("The length of the nums list is:", len(nums))
  print("The length of the names list is:", len(names))
  nums_names = nums + names
  print("The length of the joined list is:", len(nums_names))

print_list_info()

The length of the nums list is: 5
The length of the names list is: 3
The length of the joined list is: 8


# List methods

You can get an overview of the methods you can use here: https://www.w3schools.com/python/python_lists_methods.asp

Then: 
1.  Create the nums and names list again 
2.  Append the number 6 to the nums list, and print
3.  Insert the name "Sylvester" before "Jerry" in the names list and print
4.  Print the length of the nums list
5.  Remove the number 4 from the nums list, and print
6.  Print the max and min of the nums list
7.  Create a new list called new_nums which contains the numbers 40 to 50 (use the range function)

Expected output:  


In [None]:
nums = [1,2,3,4,5]
names = ["Tom", "Jerry","Spike"]
nums.append(6)
print(nums)
names.insert(1, "Sylvester")
print(names)
print(len(nums))
nums.remove(4)
print(nums)
print(max(nums), min(nums))
new_nums = range(40,51)
print(new_nums)


[1, 2, 3, 4, 5, 6]
['Tom', 'Sylvester', 'Jerry', 'Spike']
6
[1, 2, 3, 5, 6]
6 1
range(40, 51)


# Now some real data
---

1.  Open the STEAM csv file (which we have taken from Kaggle and have reduced to make it more manageable): https://drive.google.com/file/d/1amPnoBi3uhQXjFaQbUy-L-Y-eeJ1BcxE/view?usp=sharing  

2.  Open the file with Google sheets to see what is in it.  The file contains rows of data, each with a user id and a game that the user has purchased.

3.  NOW, run the code in the cell below to get:  
- users (the list of user ids in the data)
- titles (the list of titles that have been purchased)

In [14]:
import pandas as pd

# open the data file and get a copy of the Titles column
def get_users_and_titles():
  url = "https://drive.google.com/uc?export=download&id=1rkG8-cp-KLBc1zK4YMLHIsMMyyTVk5Ju"
  data_table = pd.read_csv(url)
  return data_table["User"].tolist(), data_table["Title"].tolist() 

users, titles = get_users_and_titles()

#users
titles

['The Elder Scrolls V Skyrim',
 'Fallout 4',
 'Spore',
 'Fallout New Vegas',
 'Left 4 Dead 2',
 'HuniePop',
 'Path of Exile',
 'Poly Bridge',
 'Left 4 Dead',
 'Team Fortress 2',
 'Tomb Raider',
 'The Banner Saga',
 'Dead Island Epidemic',
 'BioShock Infinite',
 'Dragon Age Origins - Ultimate Edition',
 'Fallout 3 - Game of the Year Edition',
 'SEGA Genesis & Mega Drive Classics',
 'Grand Theft Auto IV',
 'Realm of the Mad God',
 'Marvel Heroes 2015',
 'Eldevin',
 'Dota 2',
 'BioShock',
 'Robocraft',
 "Garry's Mod",
 'Jazzpunk',
 'Alan Wake',
 'BioShock 2',
 'Fallen Earth',
 "Fallout New Vegas Courier's Stash",
 'Fallout New Vegas Dead Money',
 'Fallout New Vegas Honest Hearts',
 'Grand Theft Auto Episodes from Liberty City',
 'Hitman Absolution',
 'HuniePop Official Digital Art Collection',
 'HuniePop Original Soundtrack',
 'The Banner Saga - Mod Content',
 'The Elder Scrolls V Skyrim - Dawnguard',
 'The Elder Scrolls V Skyrim - Dragonborn',
 'The Elder Scrolls V Skyrim - Hearthfire',


---
### Exercise 1 - list head, tail and length of the titles list
---

Write a function, **describe_list()** which will:
*  print the length of the list `titles`
*  print the first 10 items in `titles` (the head)  
*  print the last 5 items in `titles` (the tail)

Expected output:  
```
129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City', 'RUSH']
```

In [15]:
def print_info():
  print(len(titles))
  print(titles[:10])
  print(titles[-5:])

print_info()

129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City', 'RUSH']


---
### Exercise 2 - use a loop to print the first 20 items

Write a function which will:
*  create a new list from the first 20 items of the titles list
*  loop through the new list and print each item


In [19]:
def print_list():
  new_list=[]
  for i in titles[:20]:
    new_list.append(i)
    print(i)

print_list()

The Elder Scrolls V Skyrim
Fallout 4
Spore
Fallout New Vegas
Left 4 Dead 2
HuniePop
Path of Exile
Poly Bridge
Left 4 Dead
Team Fortress 2
Tomb Raider
The Banner Saga
Dead Island Epidemic
BioShock Infinite
Dragon Age Origins - Ultimate Edition
Fallout 3 - Game of the Year Edition
SEGA Genesis & Mega Drive Classics
Grand Theft Auto IV
Realm of the Mad God
Marvel Heroes 2015


---
### Exercise 3 - count the number of times a title appears in the list

Write a function which will:
*  count the number of times that the title Fallout 4 appears in the list

Expected output:  
168

In [20]:
def count_title():
  count = titles.count("Fallout 4")
  print(count)

count_title()

168


---
### Exercise 4 - remove all duplicates of a title from the list

Write a function which will: remove all occurences of Fallout 4 from the titles list (Hint:  you can remove an occurence of Fallout 4 repeatedly until there is only one left)


In [21]:
def remove_duplicates():
  while titles.count("Fallout 4") > 1:
    titles.remove("Fallout 4")
  print(titles.count("Fallout 4"))
remove_duplicates()


1


---
### Exercise 5 - print the counts of the first 10 titles in the list

Write a function which will:
* loop through the first 10 items in the titles list
* for each item print the number of times that title appears in the list


In [23]:
def print_count_of_first_ten():
  for i in titles[:10]:
    print(i,":", titles.count(i))

users, titles = get_users_and_titles()
print_count_of_first_ten()

The Elder Scrolls V Skyrim : 717
Fallout 4 : 168
Spore : 67
Fallout New Vegas : 337
Left 4 Dead 2 : 951
HuniePop : 22
Path of Exile : 339
Poly Bridge : 12
Left 4 Dead : 281
Team Fortress 2 : 2323


---
### Project - work as a team

The users list has the ids of all the users who have purchased STEAM games.

Write a function that will:
* count how many games have been purchased by each user.  
* calculate the percentage of all purchases made by each user
* calculate the percentage of all purchases made by these 100 users altogether
* find the id of the user who has purchased the most games of these 100 users 
* calculate the average number of games purchased by a user from the 100 
* print this information, printing each unique user just once  
Do the same with the last 100 users  

Divide up the tasks and each write one part, then try to get them all to work together.

### Practice 1
---
Get a list of unique user ids

Write some code that will loop through the users list and add each new user id to a new list called **unique_users**

**Expected output**:
12393

In [30]:
# get list of unique user ids

def get_unique_users():
  unique_users = []
  for i in users:
    if i not in unique_users:
      unique_users.append(i)
  print(len(unique_users))
  return unique_users

users, titles = get_users_and_titles()
unique_user_list = get_unique_users()

12393


[151603712,
 187131847,
 59945701,
 53875128,
 234941318,
 140954425,
 26122540,
 176410694,
 197278511,
 150128162,
 197455089,
 63024728,
 297811211,
 76933274,
 218323237,
 302186258,
 126340495,
 256193015,
 194895541,
 30007387,
 170625356,
 159538705,
 167362888,
 208649703,
 299889828,
 225987202,
 195071563,
 254906420,
 247160953,
 308653033,
 144138643,
 197902002,
 97298878,
 173909336,
 198572546,
 219509107,
 202906503,
 92107940,
 251431515,
 233558010,
 99189757,
 30695285,
 259648553,
 201069271,
 48845802,
 226212066,
 221430493,
 62923086,
 250006052,
 65117175,
 227944885,
 144004384,
 236557903,
 11373749,
 140293612,
 187851224,
 192921532,
 54103616,
 222277839,
 298547051,
 264253640,
 125718844,
 230599183,
 280061602,
 38763767,
 164543231,
 211277578,
 214167822,
 163617342,
 295931968,
 196354657,
 165034415,
 298389371,
 27543430,
 126640783,
 119410870,
 243440565,
 157694162,
 154868247,
 263856756,
 124395695,
 126656629,
 197821092,
 294797577,
 22850788

### Practice 2
---

Write code that will create a subset of the unique_users list, containing just the first 100 users and called **hundred_users**.  Loop through the hundred_users list and for each, will print the number of games that user has purchased (`users.count(unique_user`)

**Expected output**:
```
40
1
43
505
1
...
...
2
18
27
1
33
```

In [49]:
# print number of games purchased by each of first hundred users

def get_hundred_users():
  hundred_users = []
  hundred_users_dictionary = {}
  for user in unique_user_list[:100]:
    count = users.count(user)
    hundred_users_dictionary[user] = count
  for user in hundred_users_dictionary:
    print(user,hundred_users_dictionary[user])
  return hundred_users_dictionary

users, titles = get_users_and_titles()
hundred_user_count = get_hundred_users()

151603712 40
187131847 1
59945701 43
53875128 505
234941318 1
140954425 1
26122540 10
176410694 1
197278511 1
150128162 1
197455089 1
63024728 1
297811211 3
76933274 1
218323237 3
302186258 1
126340495 29
256193015 1
194895541 2
30007387 2
170625356 1
159538705 1
167362888 1
208649703 4
299889828 2
225987202 1
195071563 1
254906420 1
247160953 1
308653033 2
144138643 1
197902002 1
97298878 67
173909336 5
198572546 1
219509107 1
202906503 1
92107940 11
251431515 1
233558010 2
99189757 2
30695285 6
259648553 1
201069271 1
48845802 15
226212066 13
221430493 2
62923086 8
250006052 6
65117175 127
227944885 1
144004384 1
236557903 4
11373749 458
140293612 1
187851224 2
192921532 2
54103616 31
222277839 1
298547051 4
264253640 16
125718844 1
230599183 1
280061602 1
38763767 6
164543231 11
211277578 1
214167822 1
163617342 22
295931968 1
196354657 1
165034415 1
298389371 3
27543430 5
126640783 1
119410870 1
243440565 2
157694162 1
154868247 4
263856756 3
124395695 1
126656629 1
197821092 1
294

### Practice 3
---
Write code to calculate the percentage that the first user in the unique_user list has purchased of all the purchases made by users.  Print the users id and the percentage

*Hint*:  get the count for that user (as in the last practice), divide it by the number of purchase made (the length of the original users list) and multiply by 100

**Expected output**:  
`151603712 0.03 %`

In [54]:
# Find percentage purchases bought by first user

def percentage_first_user():
  percentage_buys_first_user = users.count(users[1]) / len(users) * 100
  print(users[1], round(percentage_buys_first_user,2), "%")

percentage_first_user()

151603712 0.03 %


### Practice 4
---
Write some code that will loop through the `hundred_users` and find the id of the user with the largest number of purchases

**Expected output**:  
```
53875128
```


In [55]:
# find user who has made most purchases

def get_most_purchases():
  most_purchases = 0
  user_most_purchases = 0
  for user in hundred_user_count:
    count = users.count(user)
    if count > most_purchases:
      most_purchases = count
      user_most_purchases = user
  print(user_most_purchases)

get_most_purchases()

53875128


### Practice 5
---
Write some code that will loop through the `hundred_users`, add all the purchases made by them, then calculate this as a percentage of the total number of purchases made (as before)  divide by 100 (the number of users in this list) to get the average.

**Expected output**:  
```
0.01 %

```

In [63]:
# find percentage of total purchases made by first hundred users

def percentage_first_hundred():
  first_hundred_user_purchases = 0
  for user in hundred_user_count:
    first_hundred_user_purchases += hundred_user_count[user]
  percentage = first_hundred_user_purchases / len(users) * 100
  average_percentage = percentage / 100
  print(round(average_percentage,2),"%")

percentage_first_hundred()

0.01 %


### Practice 6
---

Write some code that will loop through the `hundred_users`, add all the purchases made by them, then divide by 100 (the number of users in this list) to get the average.

**Expected output**:  
```
16.46

```

In [64]:
# find average number of purchases made by first 100 users
def avg_purchases_first_hundred():
  purchases_first_hundred = 0
  for user in hundred_user_count:
    purchases_first_hundred += hundred_user_count[user]
  avg_purch_first_hundred = purchases_first_hundred / 100
  print(avg_purch_first_hundred)

avg_purchases_first_hundred()

16.46


In [65]:
#using sum

def avg_purchases_first_hundred():
  avg_purch_first_hundred = sum(hundred_user_count.values()) / 100
  print(avg_purch_first_hundred)

avg_purchases_first_hundred()

16.46


### Practice 7
---
Put all the above together into a function, and add code to print the average number of games per user, the user id of the user with the maximum number of purchases, and a list of the hundred users ids and the percentage each has purchased

In [87]:
def process_user_purchases():
#list of unique users
  unique_users = []
  for i in users:
    if i not in unique_users:
      unique_users.append(i)
  print(len(unique_users))  

#number of games purchased by first 100 users
  hundred_users = []
  hundred_users_dictionary = {}
  for user in unique_user_list[:100]:
    count = users.count(user)
    hundred_users_dictionary[user] = count
  for user in hundred_users_dictionary:
    print(user,hundred_users_dictionary[user])

#percentage purchased by first user
  percentage_buys_first_user = users.count(users[1]) / len(users) * 100
  print(users[1], round(percentage_buys_first_user,2), "%")

#user who made most purchases
  most_purchases = 0
  user_most_purchases = 0
  for user in hundred_users_dictionary:
    count = users.count(user)
    if count > most_purchases:
      most_purchases = count
      user_most_purchases = user
  print(user_most_purchases)

#percentage of total purchases made by first 100 users
  first_hundred_user_purchases = 0
  for user in hundred_users_dictionary:
    first_hundred_user_purchases += hundred_users_dictionary[user]
  percentage = first_hundred_user_purchases / len(users) * 100
  average_percentage = percentage / 100
  print(round(average_percentage,2),"%")

#average number of purchases made by first 100 users
  purchases_first_hundred = 0
  for user in hundred_users_dictionary:
    purchases_first_hundred += hundred_users_dictionary[user]
  avg_purch_first_hundred = purchases_first_hundred / 100
  print(avg_purch_first_hundred)

#average number of games per user
  average_purchases = len(users) / len(unique_users)
  print("The average number of purchases per user is", round(average_purchases,0))
  
#the user with the most purchases
  most_purchases_overall = 0
  user_most_purchases_overall = 0
  for user in unique_users:
    if users.count(user) > most_purchases_overall:
      most_purchases_overall = users.count(user)
      user_most_purchases_overall = user
  print("User", user_most_purchases_overall, "has the most purhcases with", most_purchases_overall)

#percentage purchased by first 100 users
  for user in hundred_users_dictionary:
    percentage_of_purchases = hundred_users_dictionary[user] / len(users) * 100
    print(user, round(percentage_of_purchases,2), "%")

users, titles = get_users_and_titles()
process_user_purchases()

12393
151603712 0.03 %
53875128
0.01 %
16.46
The average number of purchases per user is 10.0
User 62990992 has the most purhcases with 1075
151603712 0.03 %
187131847 0.0 %
59945701 0.03 %
53875128 0.39 %
234941318 0.0 %
140954425 0.0 %
26122540 0.01 %
176410694 0.0 %
197278511 0.0 %
150128162 0.0 %
197455089 0.0 %
63024728 0.0 %
297811211 0.0 %
76933274 0.0 %
218323237 0.0 %
302186258 0.0 %
126340495 0.02 %
256193015 0.0 %
194895541 0.0 %
30007387 0.0 %
170625356 0.0 %
159538705 0.0 %
167362888 0.0 %
208649703 0.0 %
299889828 0.0 %
225987202 0.0 %
195071563 0.0 %
254906420 0.0 %
247160953 0.0 %
308653033 0.0 %
144138643 0.0 %
197902002 0.0 %
97298878 0.05 %
173909336 0.0 %
198572546 0.0 %
219509107 0.0 %
202906503 0.0 %
92107940 0.01 %
251431515 0.0 %
233558010 0.0 %
99189757 0.0 %
30695285 0.0 %
259648553 0.0 %
201069271 0.0 %
48845802 0.01 %
226212066 0.01 %
221430493 0.0 %
62923086 0.01 %
250006052 0.0 %
65117175 0.1 %
227944885 0.0 %
144004384 0.0 %
236557903 0.0 %
11373749 0.35 