### Let's experiment!

Below, you will find the code from last week's lab session. We will use this code to practice, experiment, and have some fun. Mistakes are a normal part of learning to code so don't worry about errors and take your time.

First, we know that we need to open a csv file called ```WikiPediaSongs.csv```. A good place to start is by importing the csv library and exploring it's methods and functions. You can do this by typing ```csv.``` then pressing tab (in jupyter notebook and lab after importing the library).

Another way is by using the ```dir()``` function.

In [1]:
import csv

In [2]:
dir(csv)

['Dialect',
 'DictReader',
 'DictWriter',
 'Error',
 'QUOTE_ALL',
 'QUOTE_MINIMAL',
 'QUOTE_NONE',
 'QUOTE_NONNUMERIC',
 'Sniffer',
 'StringIO',
 '_Dialect',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__version__',
 'excel',
 'excel_tab',
 'field_size_limit',
 'get_dialect',
 'list_dialects',
 're',
 'reader',
 'register_dialect',
 'unix_dialect',
 'unregister_dialect',
 'writer']

In the above cell, we see a number of methods and functions that can be called by the csv library. If the library developers were kind, they use descriptive method names. For ```csv```, we see methods like ```excel```, ```re``` (for use with regular expressions), and ```writer```. You don't have to know all of them – and rarely do coders know all the methods of a class or library. You can also see ```reader``` in the list of methods. From it's name, we can guess it's the method we need to read a csv file.

### Your turn to experiment.
Use the ```dir``` function on ```str```. Do you see any familiar methods? Which ones?

In [3]:
# Your code goes here


Now, let's experiment with the csv file. We know that the ```open()``` function is used to open all kinds of files. We usually use open in combination with ```with```. This is a safe way to open files, but let's simplify and try to open the file without ```with```. Danger zone up ahead!

In [4]:
open("WikiPediaSongs.csv")


FileNotFoundError: [Errno 2] No such file or directory: 'WikiPediaSongs.csv'

Did you get an error? If so, that means the ```WikiPediaSongs.csv``` file is not in the folder where this jupyter notebook file is located. Find out where your file is and add the path before the file name like this:

```open("/path/to/csv/file/WikiPediaSongs.csv")```

### How to find your current folder/directory:

In [5]:
print(f'Your current folder is:{os.getcwd()}')

Your current folder is:/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2/CCS-2 Course Content/week02/exercises


### How to find where files are:

In [6]:
def find_files(filename, search_path):
    result = []

    for root, dir, files in os.walk(search_path):
        if filename in files:
            result.append(os.path.join(root, filename))
    return result

In [7]:
path = find_files('WikipediaSongs.csv', '/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2')
print(f'The WikiPediaSongs.csv file is in one of {len(path)} paths:\n{path}')

The WikiPediaSongs.csv file is in one of 4 paths:
['/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2/CCS-2 Course Content/week01/WikipediaSongs.csv', '/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2/CCS-2 Course Content/week02/WikipediaSongs.csv', '/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2/CCS-2 Tutorial Material/Week 1/exercises/WikipediaSongs.csv', '/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2/CCS2_Exercises_Assessments/Tutorial_exercises/WikipediaSongs.csv']


Try using ```open``` with the right file path

In [8]:
# open("WikiPediaSongs.csv")


Running the cell above tells us a lot about the file. Read the output and try to figure out what ```mode='r'``` means.

Let's apply the ```csv.reader``` method to the TextIOWrapper object and assign it to a variable ```csv_file```.

In [9]:
csv_file = csv.reader(open(f"{path[1]}"))

Now, let's print the length of the ```csv_file```.

In [10]:
len(csv_file)

TypeError: object of type '_csv.reader' has no len()

We got an error. Where do we go from here? When in doubt, place the object in a list!

#### TIP:
Some python objects are non-subscriptable. This means, the objects can't be looped through in a ```for``` or ```while``` loop and can't be sliced using ```[0]```. A ```csv.reader``` is one such object. Try use ```[0]``` after ```csv_file``` below. Lists, however, are subscriptable, so a good first step in figuring out data structures is to use a list.

In [11]:
csv_file[0]

TypeError: '_csv.reader' object is not subscriptable

Let's try using the ```list()``` function.

In [12]:
csv_list = list(csv_file)
print(csv_list)

[['\ufeff"Molly Malone (also known as ""Cockles and Mussels"" or ""In Dublin\'s Fair City"") is a popular song set in Dublin', ' Ireland', ' which has become its unofficial anthem. A statue representing Molly Malone was unveiled on Grafton Street by then Lord Mayor of Dublin', ' Ben Briscoe', ' during the 1988 Dublin Millennium celebrations', ' when 13 June was declared to be Molly Malone Day. In July 2014', ' the statue was relocated to Suffolk Street', ' in front of the Tourist Information Office', ' to make way for Luas track-laying work at the old location. The song tells the fictional tale of a fishwife who plied her trade on the streets of Dublin and died young', ' of a fever. In the late 20th century', ' a legend grew up that there was a historical Molly', ' who lived in the 17th century. She is typically represented as a hawker by day and part-time prostitute by night.[1] In contrast', ' she has also been portrayed as one of the few chaste female street hawkers of her day. Ther

# That's a lot of output!

Let's try slicing the csv list using ```[0]``` to get the element/item at the index 0.

In [13]:
csv_list[0]

['\ufeff"Molly Malone (also known as ""Cockles and Mussels"" or ""In Dublin\'s Fair City"") is a popular song set in Dublin',
 ' Ireland',
 ' which has become its unofficial anthem. A statue representing Molly Malone was unveiled on Grafton Street by then Lord Mayor of Dublin',
 ' Ben Briscoe',
 ' during the 1988 Dublin Millennium celebrations',
 ' when 13 June was declared to be Molly Malone Day. In July 2014',
 ' the statue was relocated to Suffolk Street',
 ' in front of the Tourist Information Office',
 ' to make way for Luas track-laying work at the old location. The song tells the fictional tale of a fishwife who plied her trade on the streets of Dublin and died young',
 ' of a fever. In the late 20th century',
 ' a legend grew up that there was a historical Molly',
 ' who lived in the 17th century. She is typically represented as a hawker by day and part-time prostitute by night.[1] In contrast',
 ' she has also been portrayed as one of the few chaste female street hawkers of he

Now try slicing the original ```list(csv_file)``` the same way using ```[0]```, but do it TWICE. What happens?

In [14]:
list(csv_file)[0]

IndexError: list index out of range

#### TIP:
It's very important to assign your objects to variables, otherwise you distort them. When you work with vectorizers, you will realize you are applying methods to the same object and changing it without realizing. You can solve these issues by reloading the data, restarting your kernel or run your code from the very top. Always assign to variables and save to files in the form of ```txt```, ```csv```, or others.

Now, let's print the length of ```csv_list```.

In [15]:
len(csv_list)

5

Now, let's slice ```csv_list``` and print the lenth of the slice AND its type.

In [16]:
len(csv_list[0])

59

```csv_list[0]```, or the first element of ```csv_list```, contains 59 items.

In [17]:
type(csv_list[0])

list

```csv_list[0]```, or the first element of ```csv_list```, is also a list.

Let's slice further.

In [18]:
len(csv_list[0][0])

115

In [19]:
type(csv_list[0][0])

str

Finally, we've reached a string! But what exactly is 115?

In [20]:
print(csv_list[0][0])

"Molly Malone (also known as ""Cockles and Mussels"" or ""In Dublin's Fair City"") is a popular song set in Dublin


The senetence printed above has 115 characters, including spaces and punctuations.

If you're not comfortable with slicing, you can also do this kind of exploration using ```for loops```. Uncomment one for loop at a time below to see if you know what you are accessing.

In [21]:
# for i in csv_list:
#     print(type(i))
#     print(len(i))
#     for j in i:
#         print(type(j))
#         print(len(j))
#         for u in j:
#             print(type(u))
#             print(len(u))
#             print("\n")

Alright, so how do I turn this list of lists of strings into a string? Let's keep it simple and use the ```+``` operator. Below is a simplified version of what we will be doing.

In [22]:
x = ['Hey']
y = ['Hello']

In [23]:
print(x+y)

['Hey', 'Hello']


In [24]:
x.extend(y)
print(x)

['Hey', 'Hello']


If you know the ```range()```, ```extend()```, and ```len()``` functions, you can combine them to iterate over ```csv_list``` and put together strings.

In [25]:
string_list = []

for index_num in range(len(csv_list)):
    print(f'We are looping through item number: {index_num} of type: {type(csv_list[index_num])} in csv_list')
    string_list.extend(csv_list[index_num])

We are looping through item number: 0 of type: <class 'list'> in csv_list
We are looping through item number: 1 of type: <class 'list'> in csv_list
We are looping through item number: 2 of type: <class 'list'> in csv_list
We are looping through item number: 3 of type: <class 'list'> in csv_list
We are looping through item number: 4 of type: <class 'list'> in csv_list


In [26]:
type(string_list)

list

In [27]:
type(string_list[0])

str

In [28]:
string_list[0]

'\ufeff"Molly Malone (also known as ""Cockles and Mussels"" or ""In Dublin\'s Fair City"") is a popular song set in Dublin'

In [29]:
print(f'There are {len(string_list)} senetences in string_list')

There are 63 senetences in string_list


Now, let's try to ```.join``` method but on simplified data (deep breaths! it's only one of many many methods you can use). Uncomment each section and try to understand it.

In [30]:
lst_of_lsts = [["Hi There!"], ["How are you..?"], ["!Hello World!"]]

for x in lst_of_lsts:
    print(type(x))

# final_lst = []

# for string_lst in lst_of_lsts:
#     joined_lst = ', '.join(string_lst)
#     print(joined_lst)
#     final_lst.append(joined_lst)

# print(final_lst)
# for x in final_lst:
#     print(type(x))
# print(len(final_lst))

<class 'list'>
<class 'list'>
<class 'list'>


### WELL DONE!
Keep practicing with small chunks of code and soon you'll be able to explore all kinds of data structures :)