## Algorithms and Data Structures in Python â€” Assignment 3A ##

The following assignment will test your understanding of topics covered in the first four weeks of the course. This assignment **will count towards your grade** and should be submitted through Canvas by **07.10.2021 at 08:59 (CEST)**. You must work and submit in groups of three students. You can get at most 5 points for Assignment 3A, which is 5\% of your final grade. Assignment 3B will be released next week. Please submit your notebooks for 3A and 3B together.

1. For submission, please rename your notebook as ```{first_student_id}_{second_student_id}_{third_student}_3A.ipynb```. For example, submission by students with student ID numbers *11760001*, *11760002* and *11760003* should have the filename ```11760001_11760002_11760003_3A.ipynb```.

2. Please follow the function prototype specified in the question for writing your code. The usage of additional functions is acceptable unless the problem expressly prohibits it. If this structure is modified, it will fail automated testing steps.

3. All submissions will be checked for code similarity. Submissions with high similarity will be summarily rejected and no points will be awarded.

4. Please do NOT use the ```input()``` function in your code. 

5. For each exercise the correct solution counts for the 80% of the exercise's points, while code style counts for the remaining 20%. Please, make sure that you explain what your implementation does using comments.

6. Do **not** externally modify the CSV data file accompanying this assignment.

7. For this assignment, the usage of the ```pandas``` library is **not** allowed.

#### Problem 1 : Loading and Preparing CSV Data [1 POINT] #### 

In this exercise, you will use a dataset on Population Dynamics provided by the United Nations (Department of Economic and Social Affairs) [1]. The homepage for this data is [here](https://population.un.org/wpp/Download/Standard/CSV/). This data contains populations statistics for every country in the world.

This data is provided to you as a CSV file (```data.csv```). CSV files are text files that contain data 'delimited' by a character. When read by a program, these files can be easily converted into simple spreadsheets. CSV files can be opened with Microsoft Excel or Libreoffice Calc. Since CSV files are simple to load and do not require specialized commercial software for usage, they are extremely popular as a data-distribution format for tabular data.

NOTE : When I talk about row numbers below, they start from 1 as you would generally see in spreadsheet software like Calc or Excel. Please do not forget that Python uses zero-based indexing (indices start from 0 instead of 1).

In this example, you will use the ```csv``` library to load CSV files. For this exercise, the usage of data analysis packages like ```Pandas``` is prohibited. The usage of Numpy is acceptable.

For Problem 1, you will implement a function called ```load_data_from_csv(filepath)``` that follows the instructions below:

1.  Load data from the given CSV data file. You can use the following minimal code to achieve this:

```python
    import csv
    
    stats = []
    with open(filepath, 'r') as rf:
        reader = csv.reader(rf, delimiter = ',')
        for row in reader:
            stats.append(row)
         
```

Please note that the code snippet above only contains the basic code for loading a CSV file into python as a list-of-lists. You may still need to process the loaded data as described in the following steps.

2. Separate the ```headers``` (first row of the data that contains the column names) and ```data``` that contains the subsequent rows with actual data.

3. Return ```headers``` (as a list) and ```data``` (as a list-of-lists).

A template for this function is provided in the code block below.

In [14]:
import csv
def load_data_from_csv(filepath):

    stats = []
    with open(filepath, 'r') as rf:
        reader = csv.reader(rf, delimiter = ',')
        for row in reader:
            stats.append(row)
        headers = stats[0]
        data = stats[1:]
    return data, headers

load_data_from_csv("/Users/florianburger/Desktop/data.csv")

['LocID', 'Location', 'VarID', 'Variant', 'Time', 'MidPeriod', 'PopMale', 'PopFemale', 'PopTotal', 'PopDensity']


([['4',
   'Afghanistan',
   '2',
   'Medium',
   '1950',
   '1950.5',
   '4099.243',
   '3652.874',
   '7752.117',
   '11.874'],
  ['4',
   'Afghanistan',
   '2',
   'Medium',
   '1951',
   '1951.5',
   '4134.756',
   '3705.395',
   '7840.151',
   '12.009'],
  ['4',
   'Afghanistan',
   '2',
   'Medium',
   '1952',
   '1952.5',
   '4174.45',
   '3761.546',
   '7935.996',
   '12.156'],
  ['4',
   'Afghanistan',
   '2',
   'Medium',
   '1953',
   '1953.5',
   '4218.336',
   '3821.348',
   '8039.684',
   '12.315'],
  ['4',
   'Afghanistan',
   '2',
   'Medium',
   '1954',
   '1954.5',
   '4266.484',
   '3884.832',
   '8151.316',
   '12.486'],
  ['4',
   'Afghanistan',
   '2',
   'Medium',
   '1955',
   '1955.5',
   '4318.945',
   '3952.047',
   '8270.992',
   '12.669'],
  ['4',
   'Afghanistan',
   '2',
   'Medium',
   '1956',
   '1956.5',
   '4375.8',
   '4023.073',
   '8398.873',
   '12.865'],
  ['4',
   'Afghanistan',
   '2',
   'Medium',
   '1957',
   '1957.5',
   '4437.157',
   '409

#### Problem 2 : Extracting Rows and Columns [1.5 POINTS] #### 

For this problem, you will implement a function ```get_by_axis(data, headers, element_id)``` that returns a row or column depending upon the arguments passed to it. It accepts 3 mandatory arguments.

1. ```data``` - Contents of the ```data``` variable specified in the previous exercise.
2. ```headers``` - Contents of the ```headers``` variable specified in the previous exercise.
3. ```element_id``` - Specifies which row or column needs to be extracted. For extracting a row, the integer index of the desired row must be passed as ```element_id```. For extracting a column, the (string) name of the column must be passed as ```element_id```. An integer ```element_id``` implies that a row needs to be fetched whereas a string ```element_id``` implies that a column needs to be fetched.

If the desired row or column is invalid, your code must return ```None```. An example of an invalid row is an index larger than the number of rows in the data. Similarly, an invalid column is a column name that is not present in the original data.

The template for this function is provided below:

In [2]:
def get_by_axis(data, headers, element_id):
    # YOUR CODE HERE
    pass

#### Problem 3 : Extract Data Groups [2.5 POINTS] #### 

For this problem, you are asked to implement a function ```get_groups(data, headers, condition)``` that returns a subset of the data filtered on a set of conditions. These conditions can be passed to the function in multiple ways through the argument ```condition```. 

1. If ```condition``` is of the type ```int``` - Return the row at the index specified by integer ```condition```.

2. If ```condition``` is of the type ```str``` - Return the column with the column name specified by string ```condition```.

3. If condition is a ```dict```, this represents a special format. The ```key``` denotes which column to enforce that condition upon and the ```value``` list specifies the condition. For example, we might want to look at statistics for the country ```Netherlands```. In this case, we can write our condition dictionary as 

```python
condition = {'Location': ["Netherlands"]}
```

In the above example, we only want rows where the ```Location``` column has the value ```"Netherlands"```.

Once we have the condition dictionary ready , we can call ```get_groups``` and it should return the relevant rows in a list. For example, 

```python
condition = {'Location':["Netherlands"]}
get_group(data, headers, condition)
```

returns 

```python
[['528',
  'Netherlands',
  '2',
  'Medium',
  '1950',
  '1950.5',
  '5005.355',
  '5036.696',
  '10042.051',
  '297.807'],
 ['528',
  'Netherlands',
  '2',
  'Medium',
  '1951',
  '1951.5',
  '5067.123',
  '5100.418',
  '10167.541',
  '301.528'],
 ...,
 ['528',
  'Netherlands',
  '207',
  'Lower 95 PI',
  '2100',
  '2100.5',
  '6433.321',
  '6418.939',
  '12880.956',
  '381.998']]
```

The user can also provide multiple criterion to filter upon. In this case, you must return the data that satisfies __all__ conditions. For example, the following request:

```python
get_group(data, headers, condition={"Time":["1950", "1972"], "Location":["Netherlands", "Spain"]})
```

returns:

```python
[['528',
  'Netherlands',
  '2',
  'Medium',
  '1950',
  '1950.5',
  '5005.355',
  '5036.696',
  '10042.051',
  '297.807'],
 ['528',
  'Netherlands',
  '2',
  'Medium',
  '1972',
  '1972.5',
  '6630.041',
  '6661.835',
  '13291.876',
  '394.184'],
 ['724',
  'Spain',
  '2',
  'Medium',
  '1950',
  '1950.5',
  '13507.071',
  '14562.663',
  '28069.734',
  '56.275'],
 ['724',
  'Spain',
  '2',
  'Medium',
  '1972',
  '1972.5',
  '16936.351',
  '17727.883',
  '34664.234',
  '69.495']]
```

In [4]:
def get_group(data, headers, condition):
    # YOUR CODE HERE
    pass

#### References ####
[1] United Nations, Department of Economic and Social Affairs, Population Division (2019). World Population Prospects 2019, Online Edition. Rev. 1. 