# Container (lists, strings, numpy-arrays) in Python
Container come closest to what you know as *arrays* in other languages. 

We distinguish containers with the following propoerties:
- Which data can be put in a specific container type (only specifc data, homegeneous data)?
- Is a container mutable (can it be modified once it is created)?
- Is there an order in the containers data (all containers that we treat here are ordered)

You already know `numpy`-arrays and strings.
- `numpy`-arrays are: homogeneous, mutable, ordered
- strings are: homogeneous, immutable, ordered (see video within project)

## The list container
Lists are the `most general` ordered container in Python.

They are: heterogeneous, mutable, ordered

In [None]:
# lists live within square brackets and the individual elements are separated by commas:
l = [1, 2, 3, 4] 
print(type(l))
print(l[0], l[2])   # print the first and third element of the list 'l'.
                    # Indices of cbontainer elements start with 0 and end with 'n-1
                    # (for a container with 'n' elements) as in C)
print(len(l))       # length of a list
print(l[-1], l[-2]) # negative indices i access indices n - i if n is
                    # the number of elements in the container!

### Contents of a list

In [None]:
# lists can be heterogeneous and contain 'everything'(!)
import numpy 

def square(x):
    return x**2

# The following list contains an int, a float, a list, a module and a function!
l = [1, 3.0, "Thomas", [1, 2], numpy, square]
print(l[2], l[3][1], l[4].pi, l[5](5))

### Creation of a list

Lists are created manually only very seldomly. Most of the time they are the result of function calls.

In [None]:
l = [1, 2, 3, 4]  # most simple list creation
#print(l)

n = list(range(0, 10)) # list of running numbers used for for-looping
print(n)

# The following for-loop is obviously easier and shorter
# than a corresponding while-loop
for num in n:
    print(num)
    
#i = 1
#while i < 9:
#    print(i)
#    i = i + 1

In [None]:
# file globbing (shell pattern matching)
import glob

# obtain a list of all files in the figs subdirectory
l = glob.glob("./figs/*")
print(l)

# loop over the filenames
for f in l:
    print(f)

**Note:** In contrast to `numpy`-arrays, looping over list elements is the primary way to work with this container!

### Element access in lists

Access to list elements (individual elements, slicing) and iteration (`for`-loops) is very similar to `numpy`-arrays and strings.

In [None]:
# sublists can be accessed via slicing
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(l[3])    # access of an individual element
print(l[1:3])  # access the sublist from the second (inclusive) to
               # the fourth (exclusive) elements
print(l[4:])   # access the sublist from the fifth element up to the end
print(l[::2])  # access the sublist with each other element
print(l[1:-1]) # negative indices also work for slices
l[1:4] = [20, 21, 22]  # note that you can use slicing also on the left
                       # side of an assigment! In that case the structure
                       # of the right side (size of the container)
                       # has to match the sliced container. Note that
                       # this operation is only available for mutable
                       # containers!
print(l)                    

**Note:** Unless with `numpy`-arrays, there is no `multidimensional` list and no notion of rows and columns! A list is always one-dimensional but it can contain other lists of course.

In [None]:
l = [[1,2,3], [4,5,6], [7,8,9]]
print(l)

# access first element
print(l[1])

# chained access
print(l[1][2])

# this does not exist!
print(l[1,2])

### List methods
Explore the list methods with the tabulator-key and the question mark!

In [None]:
a = [1,2,3]

**Note:** With a heterogeneous container such as lists, not many operations make sense! They all concern modification, extension and sorting (if possible). The two mathematical operations `+` and `*` are defined as with the strings.

In [None]:
l = [1, 2, 3]
m = [4, 5, 6]

print(l * 2)
print(l + m)

## Attention: In-place vs. copy operations!
There are often identical operations for in-place modifications and operations on a new object. We talked about this already in the video lecture on strings!

In [None]:
l = ["Thomas", "Oliver", "Johannes"]

# l.sort is an 'in place sort'
# Use it if you do not need the old list anymore
# (memory efficiency)
l.sort()
l

In [None]:
l = ["Thomas", "Oliver", "Johannes"]

# sorted creates a new object with a sorted version of l
# Use it if you still need the original list.
m = sorted(l)
print(m, l)

## Example: A word count clone

We want to count the number of lines and words in a text file. The exmaple shows how to read in textfiles in Python and that lists and strings work well together.

In [None]:
!cat data/test.txt
!wc data/test.txt

In [None]:
# our solution here

n = range(10)
textfile = open("data/test.txt")

print(dir(textfile))
textfile.close()

In [None]:
# The following mimics the Linux wc-program

textfile = open("data/test.txt")

n_lines = 0
n_words = 0
n_chars = 0

for line in textfile:
    # get rid of the '\n' at the end of each line
    # in the textfile
    line = line.strip()
    line_split = line.split()
    
    # get the number of words in the current line. It is just
    # the number of list elements in 'line'
    n_words += len(line_split)
    
    # In the lecture we got the number of characters by counting
    # them from each string within line_split:
    #for word in line_split:
    #    n_chars = n_chars + len(word)
    
    # Oliver made me aware that there is an easier and
    # more accurate way; just the length of the original
    # line string! I guess that the difference of 7 to Linux
    # in this case comes from the newline characters. Linux
    # seems to count them, Python not - but I am not 100%
    # sure!
    n_chars += len(line)
    
    # the number of lines is just how often we iterate
    # though the textfile
    n_lines += 1
    
textfile.close()

print(n_lines, n_words, n_chars)