<a href="https://colab.research.google.com/github/PythonDecorator/AI_Data_Science_MSC/blob/Week_3_reading_files/Workshop3Tasks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Programming for AI & Data Science

## Workshop 3

## File IO, CSV, and JSON File Formats
________________________________________

## Aims of the workshop

Last week we looked at functions and classes, and how we might build up our repertoire of code into reusable sections using these concepts, as well as break down problems into smaller, easier to manage, sub-problems. An example of this was an exercise from last week where we converted grades into classifications, whereby we can build a simple function which takes one number, and produces one string. Using our knowledge of list comprehensions we can simply apply this basic function to each element of grades.

This week, we’ve looked at File Input & Output, looking at how Python can interact with any operating system which is running it (Windows/Mac/Linux) and manipulate files with reading and writing (or both). We explored how we might save data to files such that it can be easily understood by other parties, and why data formats are beneficial in this regard. Notable examples of this were the Comma-Separated Values (CSV) and Javascript Object Notation (JSON) data formats.
Along the way we also introduced some new vocabulary, and keywords such as context managers (using the with keyword), as well as continue (used in for loops).

In this workshop, we will also briefly consider Numpy which is a library for numerical computing within Python. These are used to provide some nice functionality for calculation purposes. Most of the numerical computing within python either uses this directly, or builds on-top of it.

Please see ‘Useful Information’ below on how to lookup certain Python functionality. The concept behind this workshop is about discovery, and experimentation surrounding topics covered so far.

Feel free to discuss the work with peers, or with any member of the teaching staff.

----

## File IO

For this workshop, as always, save a copy of the completed notebook. Name this something memorable.
E.g “Workshop 3”

## Reading Files

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Exercise 1
Create a text file in the same directory as your python notebook for this workshop session. Fill this text file with content of your choice. These could be your favourite song lyrics. Dear Theodosia from Hamilton will be my choice here.
Name this file __ex1_data.txt__.

Note: I will upload my ex1_data.txt with my “Dear Theodosia” lyrics to Canvas if you wish to follow along with my examples.

Example:
> ```
Dear Theodosia, what to say to you?
You have my eyes
You have your mother's name
When you came into the world, you cried and it broke my heart
I'm dedicating every day to you
Domestic life was never quite my style
...


At the moment, we are not interested in the format of what is in the file. Just the raw text within it.
Paste the text you are using for this exercise in the cell below so we can check the exercise results later.

---

In [39]:
content = """Dear Theodosia, what to say to you?
You have my eyes
You have your mother's name
When you came into the world, you cried and it broke my heart
I'm dedicating every day to you
Domestic life was never quite my style"""

my_file = open('/content/drive/MyDrive/University of Hull/Week 3/ex1_data.txt', mode='w')
my_file.write(content)
my_file.close()


---
### Exercise 2
In our Workshop 3 Notebook, let’s open this file, and get Python reading this data in.

From the lectures, we know that we can use `open()` to open any file path we provide. Remember, `open()` returns a file object itself. Assign this to a sensibly named variable. For now, we will specify the file path as the <u>local file path</u>, and we will explicitly pass the argument `mode='r'` even though this is the default value.

Once we have this assigned to a variable, we can print the type of the variable (it’s an object), to see what it is.
```
my_file = open('ex1_data.txt', mode='r')
print( my_file )
print( type(my_file) )
```
This should look something similar to the following:

> ```
<_io.TextIOWrapper name='ex1_data.txt' mode='r' encoding='cp1252'>
<class '_io.TextIOWrapper'>
> ```

__Additionally__, try replacing the `'ex1_data.txt'` with the full path equivalent instead. For me this might look like `'/home/cssbct/Documents/Data Science/Wk3/Workshop/ex1_data.txt'` and yours may look like `'C:/Users/...'` etc. This depends on your Operating System (Windows/Mac/Linux). Re-run the cell and make sure it also works.

__Reminder:__ Using a full path name to a file makes you code non-portable and difficult to share or run on other machines than your own so they should be used with caution. Certainly they should be avoided in your final assessment.

---


In [31]:
my_file = open('/content/drive/MyDrive/University of Hull/Week 3/ex1_data.txt', mode='r')
print( my_file )
print( type(my_file) )

<_io.TextIOWrapper name='/content/drive/MyDrive/University of Hull/Week 3/ex1_data.txt' mode='r' encoding='utf-8'>
<class '_io.TextIOWrapper'>


---
### Exercise 3
Let’s read some data out from this file object. We can use a `for` loop which iterates directly over lines in the file itself.
For each line in the file, print the line.
```
for line in my_file:
    print(line)
```
> Dear Theodosia, what to say to you?  
You have my eyes  
You have your mother's name  

__Note__: If things don’t seem to print, read the beginning of Ex 4 and it might make sense.

---

In [32]:
for line in my_file:
    print(line)

Dear Theodosia, what to say to you?

You have my eyes

You have your mother's name

When you came into the world, you cried and it broke my heart

I'm dedicating every day to you

Domestic life was never quite my style


---
### Exercise 4
Execute the `for` loop again.
We have a problem with the previous exercise. If we try and execute the `for` loop again, just printing the line, we won’t get anything out!
```
for line in my_file:
    print(line)
```
>

This is because we’ve already hit the end of the file. Python is doing some ‘tracking’ in the background which keeps track of where in the file we are. As we already hit the end of the file by iterating over it with Exercise 3, when we try to iterate again, no new lines have been added. Thus printing nothing!
This isn’t ideal. So every time we want to iterate through the file here, we need to reopen the file!

Combine the Notebook cell which opens the file, with the iterating code from Exercise 3. From the lecture’s this week, we also know that it’s important to <u>close</u> a file once we’ve finished with it. So let’s also add that to the end.

__To Merge cells__: Select a Cell, go to Edit -> Merge Cell (Above/Below).

Once this is done we should have something which looks like the following:
```
my_file = open('ex1_data.txt', mode='r')
print( my_file )
print( type(my_file) )

for line in my_file:
    print(line)
    
my_file.close()
```
This is good, as we know all of that code will execute in-order, at once. If they are in separate notebook cells we can technically execute them in any order! Now this cell will open, read, and close the file.

---

In [34]:
for line in my_file:
    print(line)


---
### Exercise 5
Now that we have everything together, and it’s not going to behave strangely when printing lines, we can begin to change our simple print expression to something more complex.

We can search for strings within other strings by executing the following boolean expression:
```
<string> in <other string>
```
E.g.: In my example, I want to check if the line we’re checking contains the word Theodosia.
Note: `“theodosia” != “Theodosia”` (Capital letters are different to lowercase in programming!).
```
if "Theodosia" in line:
    	print("Found!")
```        
If I execute this cell again, it should open the file, iterate through it, and print “Found her!” for the line in which this comes up.

> #### Found!

Depending on what is in your text document, search for some string which is present.

----

In [None]:
for line in my_file:
    if "Theodosia" in line:
        print("Found!")

my_file.close()

Found!


---
### Exercise 6
This isn’t very useful either. I want to know the line number where this lyric comes up. We can use enumerate, covered in previous workshops, to provide line numbers alongside the elements themselves.

1.	Modify your for loop to use enumerate which is passed your file object.
2.	Replace your print statement, to now print the line numbers which your string appears.

> __Found! At:   0__

In my example, Theodosia only appears once, on line 0 (the first line).
If I wanted to find a more common word. I’m going to search for “We”.

> __Found! At:   9__  
> __Found! At:  11__   
> __Found! At:  32__  
> __Found! At:  34__  

Note: I can call the function `lower()` on a string to convert it all to lowercase. This might make checking easier. I can check for “We” and “we” with a single check now.

Example:
```
if "we" in line.lower():
    	print("Found! At: ", line_no)
```
I’ve now found some more entries which I missed before! (I was only checking for “We” prior).

> __Found! At:   9__   
> __Found! At:  10__  
> __Found! At:  11__  
> __Found! At:  26__  
> __Found! At:  32__  
> __Found! At:  33__  
> __Found! At:  34__   

---

In [None]:
my_file = open('/content/drive/MyDrive/University of Hull/Week 3/ex1_data.txt', mode='r')

for line_no, line in enumerate(my_file):
    if "you" in line.lower():
        print("Found! At: ", line_no)


Found! At:  0
Found! At:  1
Found! At:  2
Found! At:  3
Found! At:  4


__Footnote about the end of lines in text files__. Many Python programmers need not know about this level of detail, but people who have programmed in other languages may be aware of the issue. Different Operating Systems denote the end of line in a text file in different ways. On linux and unix it is marked by a single `\n`; on MacOS it uses `\r` and on Windows it uses both as `\r\n`. Fortunately, when processing text files in Python the system conveniently presents all line ends as `\n` making the coding easier and more portable.

---
### Exercise 7
Instead of just printing the lines where we’ve found “we” (lowercase and uppercase variants), let’s append these lines to a fresh List.

1.	Create an empty list, giving it a suitable variable name. Make sure this is done before your for loop! Otherwise you’ll constantly make empty ones.
2.	If your chosen word is in the line, append the line to this List.
3.	After the file close line, print the List (both ways, see below).

We can see that it is indeed a List of strings:

> __Found! At:  34__   
> "We'll bleed and fight for you, we'll make it right for you\n", 'If we lay a strong enough foundation\n', "We'll pass it on to you, we'll give the world to you\n", 'I swear that\n', "We'll bleed and fight for you, we'll make it right for you\n", 'If we lay a strong enough foundation\n', "We'll pass it on to you, we'll give the world to you\n"]  

Or a nicer way, we can iterate through this List we made, and print each individual item:

> __Found! At:  34__  
> We'll bleed and fight for you, we'll make it right for you
>
> If we lay a strong enough foundation
>
>We'll pass it on to you, we'll give the world to you

---

In [None]:
my_file = open('/content/drive/MyDrive/University of Hull/Week 3/ex1_data.txt', mode='r')
foumd_lines = []
for line_no, line in enumerate(my_file):
    if "you" in line.lower():
        foumd_lines.append(line)

my_file.close()

print(foumd_lines)

for line in foumd_lines:
    print(line)


['Dear Theodosia, what to say to you?\n', 'You have my eyes\n', "You have your mother's name\n", 'When you came into the world, you cried and it broke my heart\n', "I'm dedicating every day to you\n"]
Dear Theodosia, what to say to you?

You have my eyes

You have your mother's name

When you came into the world, you cried and it broke my heart

I'm dedicating every day to you



---
### Exercise 8

Notice how we seem to be getting an extra space between prints. This is due to that “\n” which appears at the end - or rather, all of the \n.

We can remove the trailing newlines and spaces by calling `rstrip()` on our strings. To prove this we can check the following:
```
nightmare = "This is excessive\n\n\n\n\n\n\n\n\n       	"
print(nightmare)
```

> This is excessive    
>  &nbsp;  
>  &nbsp;  
>  &nbsp;  
>  &nbsp;  
>  &nbsp;  
>  &nbsp;  
>  &nbsp;  
>  &nbsp;  

Yes, that is a lot of blank space…
If we `print nightmare.rstrip()` it will return a string which is rid of those special characters and all the spaces! The original variable is __NOT__ modified.
```
print(nightmare.rstrip())
```

> This is excessive

It is immediately obvious that the newlines are gone (left), however, spaces are trickier. We can do some string concatenation if we really wanted to prove the spaces are also gone:
```
print(nightmare.rstrip() + "test")
```
> This is excessivetest

Note: The existence of `rstrip` does indeed suggest the existence of a normal `strip` function, as well as an `lstrip`. Feel free to look these up.


---

In [None]:
for line in foumd_lines:
    print(line.rstrip())


Dear Theodosia, what to say to you?
You have my eyes
You have your mother's name
When you came into the world, you cried and it broke my heart
I'm dedicating every day to you


---
## Writing Files

### Exercise 9
In a new cell, we need to open a file to write. Let’s write to the same file we’ve been working with. This time we will specify `mode='w'`. We know we need to close the file, so let’s write that in at the bottom before we forget.
```
ex9_file = open('ex1_data_copy.txt', mode='w')
print( ex9_file )
print( type(ex9_file) )

# Do stuff

ex9_file.close() # Might as well write this in before we forget.
```
As before, we should see it’s a valid file, and the type is correct.

Even though we’ve not told Python to write anything, you may notice that a file has been created! It’s just empty.

---

In [None]:
ex9_file = open('/content/drive/MyDrive/University of Hull/ex1_data_copy.txt', mode='w')
print( ex9_file )
print( type(ex9_file) )

# Do stuff

ex9_file.close() # Might as well write this in before we forget.

<_io.TextIOWrapper name='/content/drive/MyDrive/University of Hull/ex1_data_copy.txt' mode='w' encoding='utf-8'>
<class '_io.TextIOWrapper'>


---
### Exercise 10
We can call `ex9_file.write()` passing it a string to write to the file. The behaviour of this depends on the file mode. In our case, we’re on write, which will erase any existing content with what we put, if the file doesn’t exist it will create it.

<u>Write some content to this file</u> by modifying the `#Do Stuff` section of our code.

After this open up the file and check what you intended to write. (Open in Notepad or Notepad++).

> ## We're taking the hobbits to Isengard!

----

In [None]:
ex9_file = open('/content/drive/MyDrive/University of Hull/ex1_data_copy.txt', mode='w')
print( ex9_file )
print( type(ex9_file) )

ex9_file.write("## We're taking the hobbits to Isengard!")


<_io.TextIOWrapper name='/content/drive/MyDrive/University of Hull/ex1_data_copy.txt' mode='w' encoding='utf-8'>
<class '_io.TextIOWrapper'>


---
### Exercise 11
Some time later we realised that we missed off some information. Let’s append that to our file now.

1.	Open the file in append mode
2.	Create a List, and put some strings you want to write in it. (Note: These can also be numbers converted to strings - If we try numbers, we’ll get an error).
```
some_jibberish = ["Doe, a deer", "a female deer", "far", "a long long way to run!" ]
```
3.	Iterate through this list, and for each element write this to the file.
4.	Don’t forget to close the file!

Open your file once finished, and check if the original content is present, with your additions added to the bottom in sequence order.

> We're taking the hobbits to Isengard!Doe, a deera female deerfaralong long was to run!Doe, a deera female deerfara long long was to run!

5.	Whoops! They’re all on the same line in the file! Remember to add that special `\n` to each string you want to add! We can either change each string element literal. Or do some string concatenation in the write line itself!
    a.	E.g. Instead of `.write(s)` we can do `.write( s + "\n" )`. Saves us a lot of typing!

> We're taking the hobbits to Isengard!Doe, a deer  
a female deer  
far  
a long long was to run!

6.	Ahh! Almost. We need to make sure we write a newline at the end of the Exercise 9 bit, or before the first line written for this exercise. Try fixing this so you get the following output (swings and roundabouts on this one):

> We're taking the hobbits to Isengard!  
Doe, a deer  
a female deer  
far  
a long long was to run!

---

In [37]:
some_jibberish = ["Doe, a deer", "a female deer", "far", "a long long way to run!" ]
ex9_file = open('/content/drive/MyDrive/University of Hull/ex1_data_copy.txt', mode='a')
for line in some_jibberish:
    ex9_file.write(line + "\n")
ex9_file.close()

---
### Exercise 12
In the lectures we introduced Context Managers. Let’s use those now.  
Recall:
```
f = open(..)
# Do stuff
f.close()
```
Becomes:
```
with open(..) as f:
	# Do Stuff.
```
Duplicate your solutions to the previous answers involving file opening, replacing the clunky open, and close mechanisms with some Context Managers.

---

In [38]:
with open('/content/drive/MyDrive/University of Hull/ex1_data_copy.txt', mode='a') as ex9_file:
    for line in some_jibberish:
        ex9_file.write(line + "\n")

---
### Exercise 13
Try the following in a new cell:
```
with open("./no_exist.txt") as f:
	pass
```
You should get the following output:
```
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5680/3485904372.py in <module>
----> 1 with open("./no_exist.txt") as f:
      2         pass

FileNotFoundError: [Errno 2] No such file or directory: './no_exist.txt'
```
This week we also introduced `try` and `except` for helping with Error Handling which follows the following format:
```
try:
	# Do stuff here.
except:
	# Any Issues, do this block.
```
1.	Put the context manager, in its entirety, within the `try` block.
2.	Add a nice print message into the `except` block which alerts you that something happened.
```
try:
    with open("./no_exist.txt") as f:
        pass
except:
    print("Uh oh, we're in trouble.")
```

---

In [3]:
with open("./no_exist.txt") as f:
    pass

FileNotFoundError: [Errno 2] No such file or directory: './no_exist.txt'

In [4]:
try:
    with open("./no_exist.txt") as f:
        pass
except:
    print("Uh oh, we're in trouble.")

Uh oh, we're in trouble.


---
### Exercise 14
We can improve this, currently we have no idea what the error is.

Modify the `except` line to catch the general exception `Exception`. Using the `as` keyword, give it a useful name. Then use this in your `print` statement.

> Uh oh, we're in trouble. [Errno 2] No such file or directory: './no_exist.txt'

---

In [5]:
try:
    with open("./no_exist.txt") as f:
        pass
except Exception as e:
    print("Uh oh, we're in trouble.", e)

Uh oh, we're in trouble. [Errno 2] No such file or directory: './no_exist.txt'


---
### Exercise 15
As we have already encountered this error, we can add a more specific except clause above the general one. Make this catch the `FileNotFoundError` exception, and get it to print something different.
```
except FileNotFoundError as not_found:
    print( "Didn't find the file, see message below" )
    print ( not_found )
```
> Didn't find the file, see message below  
> [Errno 2] No such file or directory: './no_exist.txt'


This try, except structure will prevent the whole of python from erroring out. Any code after this structure will continue to be executed as normal. If this code is unrelated then it will execute just fine. Be cautious of putting code outside this try structure, if it relates to what’s inside.

The behaviour of errors being swallowed by these `except` clauses is known as Error Hiding.

---

In [6]:
try:
    with open("./no_exist.txt") as f:
        pass
except FileNotFoundError as not_found:
    print( "Didn't find the file, see message below" )
    print ( not_found )

Didn't find the file, see message below
[Errno 2] No such file or directory: './no_exist.txt'


---
## CSV
### Exercise 16
On canvas you will find a `wind_data.csv` file. Download this to the same directory as your python notebook. Just like the text files from earlier.

---

In [35]:
with open('/content/drive/MyDrive/University of Hull/Week 3/wind_data.csv') as csv_file:
    pass

---
### Exercise 17
Import csv, and read the wind_data.csv file using a context manager. The file will be in ‘read’ mode.

Create a csv_reader, passing it the file we’ve just opened.

Print each line of this reader to see what data we’re dealing with.

---

In [7]:
import csv

In [8]:
with open('/content/drive/MyDrive/University of Hull/Week 3/wind_data.csv') as csv_file:
    csv_reader = csv.reader(csv_file)
    for line in csv_reader:
        print(line)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
['27 11 2018 03:50', '3599.97802734375', '19.5851707458496', '3600', '204.169403076171']
['27 11 2018 04:00', '3600.48193359375', '19.5133304595947', '3600', '206.03140258789']
['27 11 2018 04:10', '3600.96508789062', '18.9696197509765', '3600', '204.016403198242']
['27 11 2018 04:20', '3600.89111328125', '19.5314407348632', '3600', '203.518203735351']
['27 11 2018 04:30', '3600.90502929687', '19.5506992340087', '3600', '204.87629699707']
['27 11 2018 04:40', '3601.31005859375', '19.0379905700683', '3600', '204.154800415039']
['27 11 2018 04:50', '3601.03393554687', '19.9146308898925', '3600', '204.29930114746']
['27 11 2018 05:00', '3601.31689453125', '19.0386295318603', '3600', '203.981292724609']
['27 11 2018 05:10', '3601.58911132812', '18.4850597381591', '3600', '203.671997070312']
['27 11 2018 05:20', '3601.669921875', '18.8489894866943', '3600', '203.233901977539']
['27 11 2018 05:30', '3601.80810546875', '18.31650

---
### Exercise 18
Extract the first line, as this contains the headers - Store this as a variable. Take note of the type of this line, compared to our lines from the raw file reading earlier in the workshop. What do you notice?

---

In [9]:
headers = None
with open('/content/drive/MyDrive/University of Hull/Week 3/wind_data.csv') as csv_file:
    csv_reader = csv.reader(csv_file)

    for line in csv_reader:
        headers = line
        break
print(headers)
print(type(headers))

['\ufeffDate/Time', 'LV ActivePower (kW)', 'Wind Speed (m/s)', 'Theoretical_Power_Curve (KWh)', 'Wind Direction (°)']
<class 'list'>


---
### Exercise 19

1.	Which index would we need to take to extract the “Wind Speed” attribute?
2.	For each line, obtain the Wind Speed (index the correct index).
3.	This will be a string! Convert/cast it to a float
4.	Append it to a list of wind speeds.

Note: Be careful not to include the headers in this! Just data values we want. Numbers to do analyses on.

```
print(speed)
```

---

In [10]:

speeds = []
with open('/content/drive/MyDrive/University of Hull/Week 3/wind_data.csv') as csv_file:
    csv_reader = csv.reader(csv_file)

    for line_no, line in enumerate(csv_reader):
        if line_no == 0:
            continue
        speeds.append(float(line[1]))

print(speeds)


[380.047790527343, 453.76919555664, 306.376586914062, 419.645904541015, 380.650695800781, 402.391998291015, 447.605712890625, 387.2421875, 463.651214599609, 439.725708007812, 498.181701660156, 526.816223144531, 710.587280273437, 655.194274902343, 754.762512207031, 790.173278808593, 742.985290527343, 748.229614257812, 736.647827148437, 787.246215820312, 722.864074707031, 935.033386230468, 1220.60900878906, 1053.77197265625, 1493.80798339843, 1724.48803710937, 1636.93505859375, 1385.48803710937, 1098.93200683593, 1021.4580078125, 1164.89294433593, 1073.33203125, 1165.30798339843, 1177.98999023437, 1170.53601074218, 1145.53601074218, 1114.02697753906, 1153.18505859375, 1125.3310546875, 1228.73205566406, 1021.79302978515, 957.378173828125, 909.887817382812, 1000.95397949218, 1024.47802734375, 1009.53399658203, 899.492980957031, 725.110107421875, 585.259399414062, 443.913909912109, 565.253784179687, 644.037780761718, 712.058898925781, 737.394775390625, 725.868103027343, 408.997406005859, 62

---
### Exercise 20

1.	How many wind speed records/entries do we have? How might we find this out from the List we’ve just created.
2.	What is the average Wind Speed recorded in these data? Hint: You may find the `sum` function useful here.
3.	What is the minimum Wind Speed? (Hint: `max( some_list )`)
4.	What is the maximum Wind Speed? (Hint: `min( some_list )`)

---

In [11]:
print("Total entries", len(speeds))
print("Average speed", sum(speeds)/len(speeds))
print("Min speed", min(speeds))
print("Max speed", max(speeds))


Total entries 50530
Average speed 1307.6843318793121
Min speed -2.47140502929687
Max speed 3618.73291015625


---
### Exercise 21
Using a new Context Manager, this time set for write mode. Create a CSV Writer, and write out your Wind Speed List you created in Ex18.

View this file in a notepad (or Notepad++) to verify.

---

In [12]:
with open('/content/drive/MyDrive/University of Hull/Week 3/wind_data_copy.csv', mode='w') as csv_file:
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(speeds)

---
## Small Introduction to Numpy
### Exercise 22
Numpy is used for numerical computing, and allows us to convert from Lists to Numpy Arrays. These types have some useful functionality defined on them!

Have a look at the following code. Numpy arrays behave very similarly to Python standard Lists; however, they have more defined functions available to them for computational purposes. Previously you had to create an expression to get the summation and then the length and calculate the mean. Numpy has a function to do this. Remember: Functions are useful for utilities and routines we may often require. Therefore, numpy bundled a lot of these as behaviours on their own Data Type numpy `ndarray`.

For scientific computing, Numpy is written with several optimisations in mind. This makes it significantly faster for calculating on large sets of data.
```
import numpy as np # Just an alias, I type np instead of numpy
x = np.array( speed )
print( type(speed) )
print( type(x) )

print(x[0] == speed[0]) # Standard Indexing. So far equivalent in how we use them!
print(x[-1] == speed[-1]) # Reverse Indexing.

# useful statistics
print("Mean:", x.mean() )
print("Max:", x.max() )
print("Min:", x.min() )

# More complex statistical measures of spread/dispersion
print(x.std()) # Standard Deviation
print(x.var()) # Variance = std ** 2
try:
	print(mean(speed)) # Try either of these.
	#print(speed.mean())
except Exception as e:
	print(e)
```

---

In [22]:
import numpy as np # Just an alias, I type np instead of numpy

In [23]:

x = np.array( speeds )
print( type(speeds) )
print( type(x) )

print(x[0] == speeds[0]) # Standard Indexing. So far equivalent in how we use them!
print(x[-1] == speeds[-1]) # Reverse Indexing.

# useful statistics
print("Mean:", x.mean() )
print("Max:", x.max() )
print("Min:", x.min() )

# More complex statistical measures of spread/dispersion
print(x.std()) # Standard Deviation
print(x.var()) # Variance = std ** 2
try:
    # print(mean(speeds)) # Try either of these.
    print(speed.mean())
except Exception as e:
    print(e)

<class 'list'>
<class 'numpy.ndarray'>
True
True
Mean: 1307.6843318793105
Max: 3618.73291015625
Min: -2.47140502929687
1312.4462551316074
1722515.1726089804
name 'speed' is not defined


---
## JSON Read/Write
### Exercise 23
Recall Davey McDuck from this week’s lectures. I have decided to add a new data attribute for our ducks, the number of people they follow on twitter. This is known as ‘following’.
```
duck_1 = {
	"first_name": "Davey",
	"last_name": "McDuck",
	"location": "Rob's Office",
	"insane": True,
	"followers": 12865,
	"following": 120,
	"weapons": ["wit", "steely stare", "devilish good looks"],
	"remorse": None
}
```
Let’s build up our duck collection. We can represent this as a List of Dictionaries. Where each Dictionary follows the same pattern for outlining a Duck. First let’s define some ducks. Feel free to add your own!
```
duck_2 = {
	"first_name": "Jim",
	"last_name": "Bob",
	"location": "Turing Lab",
	"insane": False,
	"followers": 123,
	"following": 5000,
	"weapons": ["squeak"],
	"remorse": None
}

duck_3 = {
	"first_name": "Celest",
	"last_name": "",
	"location": "Throne Room",
	"insane" : True,
	"followers": 40189,
	"following": 1, # Her other account
	"weapons": ["politics", "dance moves", "chess grandmaster", "immortality"]
}
```

We shall put these in a List called duck_collection
```
duck_collection = [ duck_1, duck_2, duck_3 ]
```
Go through the `duck_collection`, and make sure each element is in-place and all your ducks are accounted for.

---

In [13]:
duck_1 = {
    "first_name": "Davey",
    "last_name": "McDuck",
    "location": "Rob's Office",
    "insane": True,
    "followers": 12865,
    "following": 120,
    "weapons": ["wit", "steely stare", "devilish good looks"],
    "remorse": None
}

duck_2 = {
    "first_name": "Jim",
    "last_name": "Bob",
    "location": "Turing Lab",
    "insane": False,
    "followers": 123,
    "following": 5000,
    "weapons": ["squeak"],
    "remorse": None
}

duck_3 = {
    "first_name": "Celest",
    "last_name": "",
    "location": "Throne Room",
    "insane" : True,
    "followers": 40189,
    "following": 1, # Her other account
    "weapons": ["politics", "dance moves", "chess grandmaster", "immortality"]
}

In [14]:
duck_collection = [ duck_1, duck_2, duck_3 ]

---
### Exercise 24

1.	`import json`
2.	Using a context manager, open a json file for writing
3.	Using `json.dump`, write your `duck_collection to the file you’ve opened. Remember that the arguments are almost backwards from what we’re used to!

Open the file and we can copy its contents, which are rather unreadable at the moment, and use a pretty printing website to make it a bit easier on the eyes.

Paste your file contents into https://jsonformatter.org/json-pretty-print and see the output. Are all your ducks there? Do they all have their attributes?

---

In [15]:
import json

In [16]:
with open('/content/drive/MyDrive/University of Hull/Week 3/duck_collection.json', mode='w') as json_file:
    json.dump(duck_collection, json_file)

---
### Exercise 25
Using `json.load`, load your saved json file back in, assigning the output to a new list variable (other than `duck_collection`). This should be identical to your `duck_collection`. We can check for this by checking the equivalence between both lists.

---

In [17]:
with open('/content/drive/MyDrive/University of Hull/Week 3/duck_collection.json') as json_file:
    duck_collection_2 = json.load(json_file)
print(duck_collection == duck_collection_2)

True


---
### Exercise 26
Write some code which, for each duck, will calculate the difference between the number of people following them and the number of followers of their twitter account. (Positive if more people follow them, than they follow).

1.	Print these
2.	Append them to an empty list called `trendy_ducks`

E.g. for Davey, this would be his “followers” minus his “following” value. In this case 12865 - 120 = 12745. Note: This should be done programmatically! I might give you many more ducks. I want these resultant numbers for ALL ducks.

Hint: How do we index dictionaries? `some_dict["key"]` should give us the value. Our List we just got from `json.load` has a dictionary at every index of our List (Nested structures here).

> 12745  
-4877  
40188  
Trendy Ducks: [12745, -4877, 40188]

---

In [20]:
trendy_ducks = []
with open('/content/drive/MyDrive/University of Hull/Week 3/duck_collection.json') as json_file:
    duck_collection = json.load(json_file)

    for duck in duck_collection:
        print(duck["followers"] - duck["following"])
        trendy_ducks.append(duck["followers"] - duck["following"])

print("Trendy Ducks:", trendy_ducks)

12745
-4877
40188
Trendy Ducks: [12745, -4877, 40188]


---
### Exercise 27

Numpy has some useful functionality. If I wanted to find the trendiest duck (the one with the most ‘net’ followers (followers - following), I could use max or min. But this gives me the value back out, not necessarily which duck this relates to! I wanted an index so I could track the duck down.

Convert the `trendy_ducks` list to a numpy array
```
arr_trendy_ducks = np.array(trendy_ducks)
```
We can now call `argmax()` on this numpy ndarray object (or `argmin()` too) to show which duck has the most (or least).
```
print( arr_trendy_ducks.argmax() )
```
If I assign a variable to that function call, I now have the index of the trendiest duck. I can use this to go back to my original `duck_collection` List, which houses each duck dictionary, and pull things like their name.

__Print the first name of the trendiest duck, programmatically, and print out their ‘net’ following count (the thing you calculated!).__

No manual entry here. This code should work directly off of the List collection, so that I can add more ducks and your code would work exactly the same, and maybe a new Duck is crowned champion.

Hint: `duck_collection[0]` would get our first duck. We can replace 0 with any variable, so long as it returns an integer. `duck_collection[0]` would return a whole dictionary related to that duck. We can then reference the keys within that dictionary!
```
duck_collection[0]["some_key"]
```
Hint 2: You may need to cast the net followers! Numpy likes to use it’s own primitive data types for numbers. You will see :)

> __Congratulations Celest. You are the trendiest duck!  
> You have a net following of 40188 followers!

---

In [25]:
arr_trendy_ducks = np.array(trendy_ducks)
max_trendy_duck = arr_trendy_ducks.argmax()

for duck in duck_collection:
    if duck_collection.index(duck) == max_trendy_duck:
        print("__Congratulations", duck["first_name"], ". You are the trendiest duck!")
        print("You have a net following of", duck["followers"] - duck["following"], "followers!")

__Congratulations Celest . You are the trendiest duck!
You have a net following of 40188 followers!


---
### Exercise 28
Your boss has given you the task of creating a separate JSON file. In this file, he only wants ducks who have a net follower count > 0.
You must filter out the ducks who follow more accounts than who follow them, and save these ducks back out to a JSON file just like your input was.

1.	Reusing your net follower calculations find indices of all the ducks you need.
2.	Store them in a separate data structure (create this, and add the correct ducks using the indices found)
3.	Open a new file in write mode to put this JSON data into. (Use a context manager).
4.	Use `json.dump` to convert your separate data structure into JSON and store it in the file.

This exercise will involve a lot of juggling of variables, and data structures, and logic. Values and Indices will need to be managed appropriately. Try doing this before moving onto Ex 29, where we’ll look at a nice feature of Numpy which might be more helpful.

---

In [26]:
# Fing the indexes needed
inexes_needed = []
for index, value in enumerate(arr_trendy_ducks):
    if value > 0:
        inexes_needed.append(index)

print(inexes_needed)

# select the docks
douckes = []
for index in inexes_needed:
    douckes.append(duck_collection[index])

print(douckes)

# write to file
with open('/content/drive/MyDrive/University of Hull/Week 3/douckes.json', mode='w') as json_file:
    json.dump(douckes, json_file)

print(douckes)

[0, 2]
[{'first_name': 'Davey', 'last_name': 'McDuck', 'location': "Rob's Office", 'insane': True, 'followers': 12865, 'following': 120, 'weapons': ['wit', 'steely stare', 'devilish good looks'], 'remorse': None}, {'first_name': 'Celest', 'last_name': '', 'location': 'Throne Room', 'insane': True, 'followers': 40189, 'following': 1, 'weapons': ['politics', 'dance moves', 'chess grandmaster', 'immortality']}]
[{'first_name': 'Davey', 'last_name': 'McDuck', 'location': "Rob's Office", 'insane': True, 'followers': 12865, 'following': 120, 'weapons': ['wit', 'steely stare', 'devilish good looks'], 'remorse': None}, {'first_name': 'Celest', 'last_name': '', 'location': 'Throne Room', 'insane': True, 'followers': 40189, 'following': 1, 'weapons': ['politics', 'dance moves', 'chess grandmaster', 'immortality']}]


---
### Exercise 29

Numpy provides some functionality outside of just its objects. (https://numpy.org/doc/stable/reference/generated/numpy.where.html)
```
np.where( expression )
```
This returns a `ndarray` of indices where the expression holds True. For example, we can provide a whole array of values, and an expression which can be used to filter it. Numpy will then not only find where the condition is True or False, but then convert those into indices and provide them.

E.g.  
We can find all the indices of ducks, where their “net follower” count is even.
```
arr_trendy_ducks % 2 == 0
```
Normally we would pass an integer/number to this expression. However, with numpy arrays it can be applied to every value. If we print the output of this expression, we should get a List of True/False, where the condition holds!

We can pass this True/False List into `np.where()` and it will convert those into the indices for us.

Putting it together:
```
#Ex 29
print(arr_trendy_ducks % 2 == 0)

print(np.where(arr_trendy_ducks % 2 == 0))
```
> __[False False True]  
(array([2]),)__

This has returned a tuple. Remember, a tuple is defined as `(item, item, item)`. The trick here is that it’s a single element, you can see by the trailing comma. To get the actual array/List which contains the indices you’ll need to call `[0]` on the `np.where` result.
Note: If we wanted to convert a numpy array back into a List, we can cast it just like we did with numbers weeks ago.
```
actual_indices = np.where(arr_trendy_ducks % 2 == 0)[0]
print(actual_indices)
print(type(actual_indices))
print(type(list(actual_indices)))
```
> __[2]  
<class 'numpy.ndarray'>  
<class 'list'>__

_Technical Explanation: In our toy example here, we’re using a single dimension (List of numbers). But np.where is very powerful and can actually be run over N-dimensional data. Therefore each entry in the tuple is a dimension. As we only have 1, we get a single one back._

As we can see, only one of our ducks has an even count. This is the duck at index 2. Surprise, surprise, this is Celest again.



---

In [27]:
# Ex 29
print(arr_trendy_ducks % 2 == 0)

print(np.where(arr_trendy_ducks % 2 == 0))

[False False  True]
(array([2]),)


In [28]:
actual_indices = np.where(arr_trendy_ducks % 2 == 0)[0]
print(actual_indices)
print(type(actual_indices))
print(type(list(actual_indices)))

[2]
<class 'numpy.ndarray'>
<class 'list'>


---
### Exercise 30
Copy your answer to Ex 28, and now use the `np.where` from Ex 29 to make your solution tidier, cleaner, more readable. In my example I did a boolean expression for even numbers, you need to find those > 0.

Hint: The array you get back, contains elements which are the index to look-up. A list comprehension is a perfect choice here. It sounds complicated, but it’s just a List of indices you want to go through, and use them to reference the right ducks. Making a new list out of them.

> [{'first_name': 'Celest',  
  'insane': True,  
  'followers': 40189,  
  'following': 1,  
  'weapons': ['politics', 'dance moves', 'chess grandmaster', 'immortality']}]
  


---

In [29]:
# Fing the indexes needed
douckes = [ duck_collection[index] for index in np.where(arr_trendy_ducks > 0)[0]]
print(douckes)

with open('/content/drive/MyDrive/University of Hull/Week 3/douckes2.json', mode='w') as json_file:
    json.dump(douckes, json_file)




[{'first_name': 'Davey', 'last_name': 'McDuck', 'location': "Rob's Office", 'insane': True, 'followers': 12865, 'following': 120, 'weapons': ['wit', 'steely stare', 'devilish good looks'], 'remorse': None}, {'first_name': 'Celest', 'last_name': '', 'location': 'Throne Room', 'insane': True, 'followers': 40189, 'following': 1, 'weapons': ['politics', 'dance moves', 'chess grandmaster', 'immortality']}]


__The Extended Exercises are optional, and are offered as an advanced supplement for those who have completed the existing work and wish to expand on their knowledge and challenge themselves further.__

---

### There is no extended exercise this week

This workshop is considered lengthy, and conceptually difficult enough as-is. If you do wish to learn more, after completing EVERYTHING, feel free to dive deeper into Numpy.

Create more ducks, make a duck army.
Use random number generators to generate this duck army. Numbers will be easy, but what about weapons? Maybe you want to sample the duck’s weapons from a predefined list of weapons (an armory of sorts).

For more advanced data mocking, we have a library called `faker` which can do more than simple random ranges, but distributions, names, places, etc.

There is nothing stopping you now, you’ve learned Python syntax, as we continue you’re just adding more to the knowledge you have. Play around with things, go crazy.

