
# Introductory exercises

**Instructions:** Complete the exercises in this notebook and submit it via Canvas.  In all these exercises use pure Python without using any libraries.


## Part 1:  working with CSV files

CSV (comma separated values) is one of the basic formats for distributing data.  CSV files are used to represent data that is in the form of a matrix.  For example: 

\begin{pmatrix}
12 & 13 & 1\\
3 & 5 & 2
\end{pmatrix}

Let's create a CSV file that contains this matrix:

In [7]:
data = """12,13,1
3,5,2
"""
file_handle = open("data.csv", "w")
file_handle.write(data)
file_handle.close()

The following will print the contents of the file (on a Windows machine use `!type` instead of `!cat`):

In [9]:
!type data.csv

12,13,1
3,5,2


### Reading a CSV file

Write a function called `csv_read(file_name)` that reads the data stored in the given file and returns a matrix as a list-of-lists.  Given the above file if you read it using your function

```python
matrix = csv_read("data.csv")
```

should give you the matrix

```python
[[12.0, 13.0, 1.0], [3.0, 5.0, 2.0]]
```

and

```python
>>> matrix[0]
[12.0, 13.0, 1.0]
```

```python
>>> matrix[1][2]
2.0
```

In [11]:
def csv_read(file_name) :
    try:
        file_handle = open(file_name)   # file_name is the name of the file
    except :
        return -1
    with file_handle :
      matrix = []
      for line in file_handle :
        matrix.append(list(map(float, line.split(","))))
      return matrix

The following won't do much until you provide an implementation for `csv_read`.  The Python `pass` keyword is a command that does nothing, and is a placeholder for your implementation.

In [12]:
matrix = csv_read("data.csv")
print(matrix)

[[12.0, 13.0, 1.0], [3.0, 5.0, 2.0]]


Some pointers to get you started:


First, here's the Pythonic way of reading and processing a file:

```Python
    try: 
        file_handle = open(file_name)   # file_name is the name of the file
    except :
        return -1
    with file_handle :
        for line in file_handle :
            # process each line
```

The `try-except` block takes care of the situation of a file name that does not correspond to an open-able file.
More details on reading information from files is provided in the supplementary python_files notebook.

For processing each line, we recommend using a string's [split](https://docs.python.org/3.7/library/stdtypes.html?highlight=split#str.split) method.
To convert the string literals to floating point numbers use the `float` function.

### Operations on matrices

As a second exercise, write two functions that return the sum of the elements in the rows/columns of the matrix:

In [15]:
def sum_columns(matrix) :
    """
    return a list where element i of the list contains the sum
    of all elements in column i of the input matrix.
    
    for example, using the matrix [[12.0, 13.0, 1.0], [3.0, 5.0, 2.0]]
    as input, should produce the return value [15.0, 18.0, 3.0]
    """
    columns = []
    for i in range(len(matrix[0])):
        sum = 0
        for j in range(len(matrix)):
            sum += matrix[j][i]
        columns.append(sum)
    return [columns]

def sum_rows(matrix) :
    """
    return a list where element i of the list contains the sum
    of all elements in row i of the input matrix.
    
    for example, using the matrix [[12.0, 13.0, 1.0], [3.0, 5.0, 2.0]]
    as input, should produce the return value [26.0, 10.0]
    """
    rows = []
    for i in range(len(matrix)):
        sum = 0
        for j in range(len(matrix[0])):
            sum += matrix[i][j]
        rows.append(sum)
    return [rows]

In [16]:
# code for verifying your implementation
print(sum_columns(matrix))
print(sum_rows(matrix))

[[15.0, 18.0, 3.0]]
[[26.0, 10.0]]


### CSV files in practice

CSV files are so common that the Python standard library includes a module called `csv`.  Details are in the [Python documentation](https://docs.python.org/3/library/csv.html).  As a follow up, you can rewrite your `csv_read` function using the csv module.

Reading CSV files is such a common task that there are other options in [NumPy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html) and [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).

## Part 2: Python lists: slicing and list comprehension

**Slices** allow you to create sublists of existing lists.  

The syntax for slicing is as follows:

```Python
sequence [start:stop[:step]]
```

start
Optional. Starting index of the slice. Defaults to 0.
stop
Optional. The last index of the slice or the number of items to get. Defaults to len(sequence).
step
Optional. Extended slice syntax. Step value of the slice. Defaults to 1.

For example:

In [18]:
values = [1,2,3,4,5,6,7,8]
values[1:5]

[2, 3, 4, 5]

Next, try out the following commands:

```python
values[1:3]  
values[2:-1] 
values[:2]   
values[2:]   
values[::2] # the last value is the step/stride
```

In [27]:
values[1:3]

[2, 3]

In [22]:
values[2:-1]

[3, 4, 5, 6, 7]

In [23]:
values[:2]

[1, 2]

In [33]:
values[2:]

[3, 4, 5, 6, 7, 8]

In [31]:
values[::2]

[1, 3, 5, 7]

Based on your experiment answer the following:

* What happens if you omit the start/end index?
* What is the effect of using negative indices for the start or end index?
* What is the effect of using a negative step/stride?

*If you omit the start/end index, the slice will begin at the first element (index 0), and the slice will continue to the length of the sequence (by default).*<br>
<br>
*If you use negative indices for the start of the index, the slice position will be counted from the end of the sequence (values[-2:] will start from the 2nd to last element and only include the last 2 numbers of the index, [7,8] from the example above). If you use negative indices before the end of the index, the slice will stop just before the specified negative index (values[:-1] would stop at the 2nd to last value of the index and not include the last element [1, 2, 3, 4, 5, 6, 7] if we use the example above).*<br>
<br>
*If you use a negative step/stride, the slice will begin at the end of the index and step backward (value[::-2] would count starting with the last index [8, 6, 4, 2] from the example above*

* Write code that reverses a list using a slice (hint:  negative strides).

In [35]:
values[::-1]

[8, 7, 6, 5, 4, 3, 2, 1]

### List comprehension

List comprehension is a very convenient piece of Python syntax for creating lists.  Here are a few quick exercises to help you familirize (or re-familiarize) yourself with this tool.  If you need some information of list comprehensions, the notebook on Python types and functions contains an overview of the topic.

* You are given a list of integers.  Using a single command of list comprehension create a sublist of the original list that contains only the even numbers from the list.

For example, given the list
```Python
values = [2,8,11,3,6,2]
```
The result should be the list
```Python
[2,8,6,2]
```

In [60]:
values = [2, 8, 11, 3, 6, 2]
even = [i for i in values if i % 2 == 0]
print(even)

[2, 8, 6, 2]


* Write a list comprehension that produces the first 10 numbers that are a multiple of 3.

In [59]:
multiples_3 = [j for j in range(1,31) if j % 3 == 0]
print(multiples_3)

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]


### Python dictionaries

**Most common substring**.  Write a function called `most_common_substring(s, length)` that returns the substring of the given length that occurs the most number of times within the input string `s`.  For example, on the input `'mississipi', 4`, the return value should be `'issi'` as it's the only substring of length 4 that occurs twice, and there is no string of length 4 that occurs more than twice.  Hint:  use slices to extract substrings of the appropriate length, and use a dictionary to keep track of the number of occurrences of all substrings of the given length.

In [57]:
def most_common_substring(s, length):
  substring = {}
  for i in range(len(s) - length + 1):
    if s[i:i+length] in substring:
      substring[s[i:i+length]] += 1
    else:
      substring[s[i:i+length]] = 1
  most_common = max(substring, key=substring.get)
  return most_common

In [58]:
most_common_substring('mississippi', 4)

'issi'