# Python Recap

1. Rebuilding programming from scratch
2. Variables, Data Types, and Control Structures
3. Functions

## General Thoughts

Programming is about automation. You try to avoid doing repetitive actions. You do so by abstraction on the action away from the particular case, i.e. you try and find similarities on a higher level.

## Rebuilding last year's exam questions from scratch

In order to understand all the intricate steps of abstraction. You are doing every day (but sometimes have to make explicit to a machine), we are going about last year's exercise step by step.

### Exam Question

a) You receive a list of ICD10-Codes (International Classification of Diseases) and are asked to translate them to their String-Representations and provide a description in a tabular format.

Write a Python Program that implements the following Instructions: 

- Use "http://icd10api.com/?" to get the data for each code (Parameters: code=, desc=long, r=json)
- Store them in an adequate data structure
- Transform this data structure into a table containing two columns (Name, Description). Remember to filter and set the index accordingly

Codes:

- "U07.1"
- "F42"
- "G93."

Hint:

- Make use of comments to describe your approach in plain English. 
- Also use comments, if you cannot find a proper solution to a subproblem.
- Remove the "pass" statement from the code before editing a function.`

In [7]:
import pandas as pd
import requests

def get_icd_codes(icd_codes):
    pass

def print_icd_codes():
    pass

### Understanding the task

Please get me the "real names" (String Representations) behind these codes (ICD10).

What would be ways we could be doing that?

#### 1. Browse to http://ICD10api.com and get them manually? (Use Browser and Excel, note down your steps) 

- "U07.1"
- "F42"
- "G93."

--> YOUR ANSWER

#### Question: How scalable would these approaches be? Estimate the time you would need to do it manually vs. writing the program. How many codes would it take for the automation to be faster than your manual approach.

--> YOUR ANSWER

#### 2. Use a command-line tool to get each of the manually? (Use cURL and Excel)

Read more on cURL here:
https://flaviocopes.com/http-curl/#perform-an-http-get-request

--> YOUR ANSWER

In [None]:
50% Automation:     
curl "http://icd10api.com/?code=G93.&desc=short&r=json" >> icd10.txt
curl "http://icd10api.com/?code=U07.1&desc=short&r=json" >> icd10.txt
curl "http://icd10api.com/?code=F42&desc=short&r=json" >> icd10.txt

20% Manual Labour
Transform it to json by separating each entry with a comma and putting everything into square brackets

30% Excel
Open icd10.txt in excel and remove unnecessary columns 

#### 3. Use python and get each of them individually? (Use Requests library)

--> YOUR ANSWER

In [28]:
import requests as req

req.get("http://icd10api.com/?code=G93.&desc=short&r=json")

<Response [200]>

In [29]:
dir(req.get("http://icd10api.com/?code=G93.&desc=short&r=json"))

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

In [30]:
req.get("http://icd10api.com/?code=G93.&desc=short&r=json").json() 

{'Name': 'F42',
 'Description': 'Obsessive-compulsive disorder',
 'Valid': '0',
 'Inclusions': [],
 'ExcludesOne': [],
 'ExcludesTwo': ['obsessive-compulsive personality (disorder) (F60.5)',
  'obsessive-compulsive symptoms occurring in depression (F32-F33)',
  'obsessive-compulsive symptoms occurring in schizophrenia (F20.-)'],
 'Type': 'ICD-10-CM',
 'Response': 'True'}

In [31]:
req.get("http://icd10api.com/?code=U07.1&desc=short&r=json").json() 

{'Name': 'U071',
 'Description': 'COVID-19, virus identified',
 'Valid': '1',
 'Inclusions': [],
 'ExcludesOne': [],
 'ExcludesTwo': [],
 'Type': 'ICD-10-CM',
 'Response': 'True'}

In [35]:
req.get("http://icd10api.com/?code=F42&desc=short&r=json").json()

{'Name': 'F42',
 'Description': 'Obsessive-compulsive disorder',
 'Valid': '0',
 'Inclusions': [],
 'ExcludesOne': [],
 'ExcludesTwo': ['obsessive-compulsive personality (disorder) (F60.5)',
  'obsessive-compulsive symptoms occurring in depression (F32-F33)',
  'obsessive-compulsive symptoms occurring in schizophrenia (F20.-)'],
 'Type': 'ICD-10-CM',
 'Response': 'True'}

In [36]:
req.get("http://icd10api.com/?code=G93.&desc=short&r=json").json() 
req.get("http://icd10api.com/?code=U07.1&desc=short&r=json").json() 
req.get("http://icd10api.com/?code=F42&desc=short&r=json").json()

{'Name': 'F42',
 'Description': 'Obsessive-compulsive disorder',
 'Valid': '0',
 'Inclusions': [],
 'ExcludesOne': [],
 'ExcludesTwo': ['obsessive-compulsive personality (disorder) (F60.5)',
  'obsessive-compulsive symptoms occurring in depression (F32-F33)',
  'obsessive-compulsive symptoms occurring in schizophrenia (F20.-)'],
 'Type': 'ICD-10-CM',
 'Response': 'True'}

## Variables, Data Types, and Control Structures

Which variables can you identify in the above example? Why would it make sense to introduce them?

Let's try and work our way from where we are:

In [40]:
a = req.get("http://icd10api.com/?code=G93.&desc=short&r=json").json() 
b = req.get("http://icd10api.com/?code=U07.1&desc=short&r=json").json() 
c = req.get("http://icd10api.com/?code=F42&desc=short&r=json").json()

print(a, b, c)

{'Name': 'G93', 'Description': 'Other disorders of brain', 'Valid': '0', 'Inclusions': [], 'ExcludesOne': [], 'ExcludesTwo': [], 'Type': 'ICD-10-CM', 'Response': 'True'} {'Name': 'U071', 'Description': 'COVID-19, virus identified', 'Valid': '1', 'Inclusions': [], 'ExcludesOne': [], 'ExcludesTwo': [], 'Type': 'ICD-10-CM', 'Response': 'True'} {'Name': 'F42', 'Description': 'Obsessive-compulsive disorder', 'Valid': '0', 'Inclusions': [], 'ExcludesOne': [], 'ExcludesTwo': ['obsessive-compulsive personality (disorder) (F60.5)', 'obsessive-compulsive symptoms occurring in depression (F32-F33)', 'obsessive-compulsive symptoms occurring in schizophrenia (F20.-)'], 'Type': 'ICD-10-CM', 'Response': 'True'}


### DRY - Don't Repeat Yourself

Let's refactor our simple example. One of the best practices in Software engineering is DRY (Don't repeat yourself). So, where is the above code repetitive?

- json() is used 3 times
- req.get is used 3 times
- The URL (https://icd10api.com/) is used 3 times
- The code parameter (?code=<...>) is used 3 times
- The parameters to the url ("&desc=short&r=json") are used 3 times

In [47]:
# Code
import requests as req
import pandas as pd 

url = "http://icd10api.com/"
icd_codes = ['U071', 'F42', '093']

response = []

for code in icd_codes:
    response.append(req.get(url + "?code=" + code + "&desc=short&r=json").json())
    
df = pd.DataFrame(response)
df

# Simple Data Types (String, Integer, Float, Binary,...)
# Complex Data Types (List, Dict,...)
# Data Frame (List of Dicts)



Unnamed: 0,Name,Description,Valid,Inclusions,ExcludesOne,ExcludesTwo,Type,Response
0,U071,"COVID-19, virus identified",1,[],[],[],ICD-10-CM,True
1,F42,Obsessive-compulsive disorder,0,[],[],[obsessive-compulsive personality (disorder) (...,ICD-10-CM,True
2,093,"Ear, Nose, Sinus, Control",0,,,,ICD-10-PCS,True


## Data Types

Currently if we look at the output, we are generating. It is three json formatted strings. Is this clever, if we would want to change the output format now or later?

- Simple Data Types (String, Integer, Float, Binary,...)
- Complex Data Types (List, Dict,...)
- Data Frame (List of Dicts)

### REUSE CODE

In [1]:
# Code
import requests as req
import pandas as pd 


# This function will take a URL of a JSON web service, 
# and a list of identifiers (e.g. codes)
# and return a data frame
def get_codes(url, params, code_param, codes):
    response = []
    
    for code in codes:
        params[code_param] = code
        response.append(req.get(url=url, params=params).json())
    
    df = pd.DataFrame(response)
    print(df)


# Metadata
url = "http://icd10api.com/"
params = {'code': '', 'desc': 'long', 'r': 'json'}
code_param = 'code'

# Data for requests
codes = ['U071', 'F42', '093']

get_codes(url, params, code_param, codes)

   Name                    Description Valid Inclusions ExcludesOne  \
0  U071     COVID-19, virus identified     1         []          []   
1   F42  Obsessive-compulsive disorder     0         []          []   
2   093      Ear, Nose, Sinus, Control     0        NaN         NaN   

                                         ExcludesTwo        Type Response  
0                                                 []   ICD-10-CM     True  
1  [obsessive-compulsive personality (disorder) (...   ICD-10-CM     True  
2                                                NaN  ICD-10-PCS     True  
