<a href="https://colab.research.google.com/github/BrandonMDay/Repository1/blob/main/Working_with_Data_Lists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lists

Often we need to store a number of single items of data together so that they can be processed together. This might be because all the data refers to one person (e.g. name, age, gender, etc) OR it might be because we have a set of data (e.g. all the items that should be displayed in a drop down list, such as all the years from this year back to 100 years ago so that someone can select their year of birth)

Python has a range of data structures available including:
*   lists  
*   tuples  
*   dictionaries  
*   sets

This worksheet looks at lists.

## List
A list is a set of related, individual data objects, that are indexed and can be processed as a whole, as subsets or as individual items.  Lists are stored, essentially, as contiguous items in memory so that access can be as quick as possible.  However, they are mutable (they can be changed after they have been created and stored) and so they need to have extra functionality to deal with changing list sizes.

# Let's get some lists of data
For this worksheet we are going to work with data on STEAM games.  We are going to get the data from a spreadsheet and make lists that we can find things out from.



## Creating a list
---

```
nums = [1, 2, 3, 4, 5]
names = ["Tom","Jerry","Spike"]
```

## Exercise 1
---
Write a function **print_list()** that will create the two lists `nums` and `names`, then will print them as lists, e.g.

```
print(nums)
```

In [None]:
# create the lists, and print them
def print_list():
  nums = [1, 2, 3, 4, 5]
  names = ["Tom","Jerry","Spike"]
  print(nums)
  print(names)


print_list()

[1, 2, 3, 4, 5]
['Tom', 'Jerry', 'Spike']


# Exercise 2
---

Write a function called **print_1st_3rd()** that will print the 1st and 3rd item in the names list.

In [None]:
def print_1st_3rd():
  names = ["Tom","Jerry","Spike"]
  print(names[0])
  print(names[2])


print_1st_3rd()

Tom
Spike


## Exercise 3  - Print a subset of a list
---

Write a function **print_first_2()** which will create the nums list, then print print it


In [None]:
# have a go at printing subsets of the lists
def print_first_2():
  nums = [1, 2, 3, 4, 5]
  print(nums[:2])


print_first_2()

[1, 2]


# List length

Use the len() function to get the number of items in a list.

There are 5 items in the nums list and 3 in the names list.

Write a function **print_list_info()** that will:
* create both lists
* print the length of the nums list
* print the length of the names list
* concatenate (add) the two lists together to make a new list called num_names
* print the length of the new list

Expected output:
```
The length of the nums list is: 5
The length of the names list is: 3
The length of the joined list is: 8
```


In [None]:
def print_list_info():
  nums = [1, 2, 3, 4, 5]
  names = ["Tom","Jerry","Spike"]
  print(f"Length of num list: {len(nums)}")
  print(f"Length of names list : {len(names)}")
  nums_names = nums + names
  print(f"Length of joined list: {len(nums_names)}")


print_list_info()

Length of num list: 5
Length of names list : 3
Length of joined list: 8


# List methods

You can get an overview of the methods you can use here: https://www.w3schools.com/python/python_lists_methods.asp

Then: 
1.  Create the nums and names list again 
2.  Append the number 6 to the nums list, and print
3.  Insert the name "Sylvester" before "Jerry" in the names list and print
4.  Print the length of the nums list
5.  Remove the number 4 from the nums list, and print
6.  Print the max and min of the nums list
7.  Create a new list called new_nums which contains the numbers 40 to 50 (use the range function)

**Expected output**: 
``` 
[1, 2, 3, 4, 5, 6]
['Tom', 'Sylvester', 'Jerry', 'Spike']
6
[1, 2, 3, 5, 6]
6 1
range(40, 51)
```

In [None]:
def list_methods():
  nums = [1, 2, 3, 4, 5]
  names = ["Tom","Jerry","Spike"]
  nums.append(6)
  print(nums)
  names.insert(1, "Sylvester")
  print(names)
  print(len(nums))
  nums.remove(4)
  print(nums)
  print(f"Max: {max(nums)}, Min: {min(nums)}")
  new_nums = []
  for n in range(40, 51):
    new_nums.append(n)
  print(new_nums)


list_methods()

[1, 2, 3, 4, 5, 6]
['Tom', 'Sylvester', 'Jerry', 'Spike']
6
[1, 2, 3, 5, 6]
Max: 6, Min: 1
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]


# Now some real data
---

1.  Open the STEAM csv file (which we have taken from Kaggle and have reduced to make it more manageable): https://drive.google.com/file/d/1amPnoBi3uhQXjFaQbUy-L-Y-eeJ1BcxE/view?usp=sharing  

2.  Open the file with Google sheets to see what is in it.  The file contains rows of data, each with a user id and a game that the user has purchased.

3.  NOW, run the code in the cell below to get:  
- users (the list of user ids in the data)
- titles (the list of titles that have been purchased)

In [2]:
import pandas as pd

# open the data file and get a copy of the Titles column
def get_users_and_titles():
  url = "https://drive.google.com/uc?id=1rkG8-cp-KLBc1zK4YMLHIsMMyyTVk5Ju"
  data_table = pd.read_csv(url)
  return data_table["User"].tolist(), data_table["Title"].tolist() 

users, titles = get_users_and_titles()

---
### Exercise 1 - list head, tail and length of the titles list
---

Write a function, **describe_list()** which will:
*  print the length of the list `titles`
*  print the first 10 items in `titles` (the head)  
*  print the last 5 items in `titles` (the tail)

Expected output:  
```
129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City', 'RUSH']
```

In [None]:
def print_info():
  print(len(titles))
  print(titles[0: 10])
  print(titles[-5:])
  print(len(users))


print_info()

129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City', 'RUSH']
129511


---
### Exercise 2 - use a loop to print the first 20 items

Write a function which will:
*  create a new list from the first 20 items of the titles list
*  loop through the new list and print each item


In [None]:
def print_list():
  new_titles = []
  for n in titles[:20]:
    new_titles.append(n)
  print(new_titles)

print_list()

['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2', 'Tomb Raider', 'The Banner Saga', 'Dead Island Epidemic', 'BioShock Infinite', 'Dragon Age Origins - Ultimate Edition', 'Fallout 3 - Game of the Year Edition', 'SEGA Genesis & Mega Drive Classics', 'Grand Theft Auto IV', 'Realm of the Mad God', 'Marvel Heroes 2015']


---
### Exercise 3 - count the number of times a title appears in the list

Write a function which will:
*  count the number of times that the title Fallout 4 appears in the list

Expected output:  
168

In [None]:
def count_title():
  print(titles.count("Fallout 4"))
  

count_title()

168


---
### Exercise 4 - remove all duplicates of a title from the list

Write a function which will: remove all occurences of Fallout 4 from the titles list (Hint:  you can remove an occurence of Fallout 4 repeatedly until there is only one left).  This will require a while loop.


In [None]:
def remove_duplicates():
  
  while titles.count("Fallout 4") > 1:
    titles.remove("Fallout 4")
  else:
    print(titles[:100])
  


remove_duplicates()


['The Elder Scrolls V Skyrim', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2', 'Tomb Raider', 'The Banner Saga', 'Dead Island Epidemic', 'BioShock Infinite', 'Dragon Age Origins - Ultimate Edition', 'Fallout 3 - Game of the Year Edition', 'SEGA Genesis & Mega Drive Classics', 'Grand Theft Auto IV', 'Realm of the Mad God', 'Marvel Heroes 2015', 'Eldevin', 'Dota 2', 'BioShock', 'Robocraft', "Garry's Mod", 'Jazzpunk', 'Alan Wake', 'BioShock 2', 'Fallen Earth', "Fallout New Vegas Courier's Stash", 'Fallout New Vegas Dead Money', 'Fallout New Vegas Honest Hearts', 'Grand Theft Auto Episodes from Liberty City', 'Hitman Absolution', 'HuniePop Official Digital Art Collection', 'HuniePop Original Soundtrack', 'The Banner Saga - Mod Content', 'The Elder Scrolls V Skyrim - Dawnguard', 'The Elder Scrolls V Skyrim - Dragonborn', 'The Elder Scrolls V Skyrim - Hearthfire', 'Dota 2', 'Ultra Street Fighter IV', 'FINAL FANTASY 

---
### Exercise 5 - print the counts of the first 10 titles in the list

Write a function which will:
* loop through the first 10 items in the titles list
* for each item print the number of times that title appears in the list


In [None]:
def print_count_of_first_ten():
  first_10 = []
  for item in titles[:10]:
    first_10.append(item)
  print(first_10)
  counts = []
  for n in first_10:
    print(titles.count(n))


print_count_of_first_ten()

['The Elder Scrolls V Skyrim', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2', 'Tomb Raider']
717
67
337
951
22
339
12
281
2323
257


---
### Project - work as a team

The users list has the ids of all the users who have purchased STEAM games.

Write a function that will, for the first 100 users:
* count how many games have been purchased by each user.  
* calculate the percentage of all purchases made by each user
* calculate the percentage of all purchases made by these 100 users altogether
* find the id of the user who has purchased the most games of these 100 users 
* calculate the average number of games purchased by a user from the 100 
* print this information, printing each unique user just once  
Do the same with the last 100 users  

Divide up the tasks and each write one part, then try to get them all to work together.

In [3]:
def print_100_users(users):
  user_id = []
  for n in users:
    user_id.append(n)
  user_id.sort()
  
  pos = 0
  for x in user_id:
    user_id.count(x)
    while user_id.count(x) > 1:
      user_id.remove(x)
    #if user_id.count(x) == 1:
      #if pos <=  100:
        #pos += 1
      #else:
        #break
  unique_users = user_id[:100]
  for i in unique_users:
    print(f"user: {i}, has purchased {users.count(i)} games, {(users.count(i)/len(users)) * 100} % of total purchases")
  total = 0
  for j in unique_users:
    total = total + users.count(j)
  print(f"Total of games perchased by this sample is {total}, {(total/len(users)) * 100} % of total purchases")
  highest = 0 
  highest_user = 0
  for y in unique_users:
    current = users.count(y)
    if current > highest:
      highest = current
      highest_user = y
  print(f"User with most purchases is {highest_user} {highest}")
  average = total/len(unique_users)
  print(f"Average number of games: {average}")
  


print_100_users(users)

user: 5250, has purchased 21 games, 0.016214838893993562 % of total purchases
user: 76767, has purchased 36 games, 0.027796866675417534 % of total purchases
user: 86540, has purchased 82 games, 0.06331508520511771 % of total purchases
user: 103360, has purchased 10 games, 0.0077213518542826485 % of total purchases
user: 144736, has purchased 8 games, 0.006177081483426118 % of total purchases
user: 181212, has purchased 12 games, 0.009265622225139178 % of total purchases
user: 229911, has purchased 27 games, 0.020847650006563148 % of total purchases
user: 298950, has purchased 259 games, 0.1999830130259206 % of total purchases
user: 299153, has purchased 14 games, 0.010809892595995707 % of total purchases
user: 381543, has purchased 10 games, 0.0077213518542826485 % of total purchases
user: 547685, has purchased 25 games, 0.019303379635706622 % of total purchases
user: 554278, has purchased 28 games, 0.021619785191991415 % of total purchases
user: 561758, has purchased 148 games, 0.1142