### Adventure Time! Let's experiment some...

Below, you will find the code from last week's lab session. We will use this code to practice, experiment, and have some adventures.

*Remember, mistakes are a normal part of learning to code so don't worry about errors and take your time.*

Let's start with what we know. We know that we need to open a csv file called ```WikiPediaSongs.csv```. A good place to start is by importing the csv library and exploring it's methods and functions. You can do this by typing ```csv.``` then pressing tab (in jupyter notebook and lab after importing the library).

Another way is by using the ```dir()``` function.

In [None]:
import csv

In [None]:
dir(csv)

In the above cell, we see a number of methods and functions that can be called by the csv library.

If the library developers were kind, they use descriptive method names.

For ```csv```, we see methods like ```excel```, ```re``` (for use with regular expressions), and ```writer```.

You don't have to know all of them – and rarely do coders know all the methods of a class or library.

***You can also see ```reader``` in the list of methods. From it's name, we can guess it's the method we need to read a csv file.***

### Your turn to experiment.
Use the ```dir``` function on ```str```. Do you see any familiar methods? Which ones?

In [None]:
# Your code goes here


Now, let's experiment with the csv file.

We know that the ```open()``` function is used to open all kinds of files.

We usually use open in combination with ```with```. This is a safe way to open files, but let's simplify and try to open the file without ```with```.

Danger zone up ahead!

In [None]:
open("WikiPediaSongs.csv")


Did you get an error? If so, that means the ```WikiPediaSongs.csv``` file is not in the folder where this jupyter notebook file is located. Find out where your file is and add the path before the file name like this:

```open("/path/to/csv/file/WikiPediaSongs.csv")```

### How to find your current folder/directory:

In [None]:
print(f'Your current folder is:{os.getcwd()}')

**Note the f-string**: placing an ```f``` before string quotation marks -> ```" "``` allows us to use curly brackets -> ```{}``` as an empty spot to put our variable in. Think of it like fill in the blanks. You can use f-strings for any string, even dataframe column names:

```df[f'{path}']```

### How to find where files are:

**!!!** This is just a handy function that you can reuse if you're stuck. You do not need to know any ```os``` methods and this will not be in any CCS-2 exam or assignment requirement **!!!**

In [None]:
def find_files(filename, search_path):
    result = []

    for root, dir, files in os.walk(search_path):
        if filename in files:
            result.append(os.path.join(root, filename))
    return result

In [None]:
path = find_files('WikipediaSongs.csv', '/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2')
print(f'The WikiPediaSongs.csv file is in one of {len(path)} paths:\n{path}')

Try using ```open``` with the right file path

In [None]:
# open("/path/to/csv/file/WikiPediaSongs.csv")


Running the cell above tells us a lot about the file. Read the output and try to figure out what ```mode='r'``` means.

Let's apply the ```csv.reader``` method to the TextIOWrapper object and assign it to a variable ```csv_file```.

In [None]:
csv_file = csv.reader(open('/Users/nyxinsane/Library/CloudStorage/OneDrive-UvA/Teaching/CCS-2/CCS-2 Course Content/week02/WikipediaSongs.csv'))

Now, let's print the length of the ```csv_file```.

In [None]:
len(csv_file)

We got an error. Where do we go from here? When in doubt, place the object in a **list**!

#### TIP:
Some python objects are non-subscriptable. This means, the objects can't be looped through in a ```for``` or ```while``` loop and can't be sliced using ```[0]```. A ```csv.reader``` is one such object. Lists, however, are subscriptable, so a good first step in figuring out data structures is to use a list.

Add ```[0]``` after ```csv_file``` below and read the error message carefully.

In [None]:
csv_file[0]

Let's try using the ```list()``` function.

In [None]:
csv_list = list(csv_file)
print(csv_list)

# That's a lot of output!

Let's try slicing the csv list using ```[0]``` to get the element/item at the index 0.

In [None]:
csv_list[0]

Now try slicing the original ```list(csv_file)``` the same way using ```[0]``` (try TWICE). What happens?

In [None]:
list(csv_file)[0]

#### TIP:
It's very important to assign your objects to variables, otherwise you distort them. When you work with vectorizers, you will realize you are applying methods to the same object and changing it without realizing.

You can solve these issues by reloading the data, restarting your kernel, or running your code from the very top. Always assign to variables and save to files in the form of ```txt```, ```csv```, or others.

Now, let's print the length of ```csv_list```.

In [None]:
len(csv_list)

Now, let's slice ```csv_list``` and print the lenth of the slice AND its type.

In [None]:
len(csv_list[0])

```csv_list[0]```, or the first element of ```csv_list```, contains 59 items.

In [None]:
type(csv_list[0])

```csv_list[0]```, or the first element of ```csv_list```, is also a list.

Let's slice further.

In [None]:
len(csv_list[0][0])

In [None]:
type(csv_list[0][0])

Finally, we've reached a string! But what exactly is 115?

In [None]:
print(csv_list[0][0])

The senetence printed above has 115 characters, including spaces and punctuations.

If you're not comfortable with slicing, you can also do this kind of exploration using ```for loops```. Uncomment one for loop at a time below to see if you know what you are accessing.

In [None]:
# for i in csv_list:
#     print(type(i))
#     print(len(i))
#     for j in i:
#         print(type(j))
#         print(len(j))
#         for u in j:
#             print(type(u))
#             print(len(u))
#             print("\n")

Alright, so how do I turn this list of lists of strings into a string? Let's keep it simple and use the ```+``` operator. Below is a simplified version of what we will be doing.

In [None]:
x = ['Hey']
y = ['Hello']

In [None]:
print(x+y)

In [None]:
x.extend(y)
print(x)

Let's try with the ```+``` operator

In [None]:
string = ""

for i in csv_list:
    for j in i:
        string+=j


In [None]:
print(string[2:15])
print(string[44:50])

In [None]:
print(f'There are {len(string)} characters in string')

If you know the ```range()```, ```extend()```, and ```len()``` functions, you can combine them to iterate over ```csv_list``` and put together strings.

In [None]:
string_list = []

for index_num in range(len(csv_list)):
    print(f'We are looping through item number: {index_num} of type: {type(csv_list[index_num])} in csv_list')
    string_list.extend(csv_list[index_num])

In [None]:
type(string_list)

In [None]:
type(string_list[0])

In [None]:
string_list[0]

In [None]:
print(f'There are {len(string_list)} senetences in string_list')

Now, let's try the infamous ```.join``` method but on simplified data (deep breaths! it's only one of many many methods you can use as we just saw).

Check out ```lst_of_lsts```. How would you describe it (more than "it's a list of lists")?

Uncomment each section and try to understand it.

In [None]:
lst_of_lsts = [["Hi There!"], ["How are you..?"], ["!Hello World!"]]

for x in lst_of_lsts:
    print(type(x))

# final_lst = []

# for string_lst in lst_of_lsts:
#     joined_lst = ', '.join(string_lst)
#     print(joined_lst)
#     final_lst.append(joined_lst)

# print(final_lst)
# for x in final_lst:
#     print(type(x))
# print(len(final_lst))

### WELL DONE!
Keep practicing with small chunks of code and soon you'll be able to explore all kinds of data structures :)

If you'd like to practice more, try to find out how many words are in the variable ```string```.

Want even more adventure? Try to find out how many words are in the variable ```string_list```.