# Datascientest MLOps Training: FastAPI Quiz API
 
Author:
[Dominik Bursy](mailto:dominik.bursy@allianz.com)
 
Last Update: 06 June 2023
 
---
 
The objective of this notebook is to test the Datascience Quiz API.

For the documentation see [API Docs](http://localhost:8000/docs) or [API Redoc](http://localhost:8000/redoc) which follows the standards as described by [FastAPI Metadata Tutorial](https://fastapi.tiangolo.com/tutorial/metadata/).

To start the API please run: 
- source venv/bin/activate
- uvicorn quiz_api_main:api --reload
 
---
 
## Table of Contents

- [Load Packages](#packages)
- [API Authentication](#authentication)
- [Public Endpoints](#public)
- [User Endpoints](#user)
- [Admin Endpoints](#admin)

---

## Load Packages <a class="anchor" id="packages"></a>

In [34]:
import pandas as pd
import base64
import requests
import json
from urllib.parse import quote

---

## API Authentication <a class="anchor" id="authentication"></a>

Credentials are provided through the Authorization header. This header will contain a value that is: Basic username:password but with username:password encoded in base 64.

Credentials:
- "admin": "4dm1N"
- "alice": "wonderland"
- "bob": "builder"
- "clementine": "mandarine"

[External Ressource for Basic Authentication](https://dock2learn.com/tech/how-to-implement-basic-authentication-with-fastapi/)

In [35]:
print(base64.b64encode(b"admin:4dm1N").decode("utf-8"))
print(base64.b64encode(b"alice:wonderland").decode("utf-8"))
print(base64.b64encode(b"bob:builder").decode("utf-8"))
print(base64.b64encode(b"clementine:mandarine").decode("utf-8"))

YWRtaW46NGRtMU4=
YWxpY2U6d29uZGVybGFuZA==
Ym9iOmJ1aWxkZXI=
Y2xlbWVudGluZTptYW5kYXJpbmU=


## Public Endpoints <a class="anchor" id="public"></a>

In [36]:
# Specify endpoint
url = "http://localhost:8000/"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
}

response = requests.get(url, headers=headers)

print(response.status_code)
print(pd.Series(response.headers))
print(pd.Series(response.json())[0])

200
date              Wed, 31 Jan 2024 10:35:40 GMT
server                                  uvicorn
content-length                               37
content-type                   application/json
dtype: object
Welcome to the Datascience Quiz API


## User Endpoints <a class="anchor" id="user"></a>

### User Authentication

Users and admin users have access to user endpoints

In [37]:
url = "http://localhost:8000/user"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWRtaW46NGRtMU4='
}

response = requests.get(url, headers=headers)
print(response.text)  # => "You used a valid API key."

{"username":"admin"}


In [38]:
url = "http://localhost:8000/user"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)  # => "You used a valid API key."

{"username":"alice"}


---

### Query Questionnaire

Users are able to query a questionnaire of 5, 10 or 20 MCQs questions from a test type (use) as well as one or more categories (subject)

In [39]:
df_questions = pd.read_excel('/Users/dominik.bursy/Documents/1_MLops/FastAPI/questions_en.xlsx')
df_questions['question_count'] = 1
print(df_questions[['use', 'subject', 'question_count']].groupby(['use', 'subject']).sum())

                                      question_count
use              subject                            
Positioning test Data Streaming                    3
                 Databases                         6
                 Distributed systems               7
                 Docker                            5
Total Boot Camp  data science                      8
                 machine-learning                  7
Validation test  Automation                       10
                 Classification                   10
                 Data Streaming                   10
                 Distributed systems              10


In [40]:
# Specify use, subject and number of MCGs
# Note Multiple subjects need to be seperated by |
use_input = quote('Total Boot Camp')
subject_unput = quote('data science')
mcqs_input = quote('5')

url = "http://localhost:8000/user/" + use_input + "/" + subject_unput + "/" + mcqs_input

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
pd.DataFrame(json.loads(json.loads(response.text)))

Unnamed: 0,question,responseA,responseB,responseC,responseD
44,Are every dataset worth a Data Science project?,No.,"If it's big enough, yes.",Yes.,
48,"Your model is all done and working, what's next?",My project is done!,Analyze the results and tune the existing mode...,,
47,"When building a model, you have to",Look out for parameters that can be optimized ...,Train it on all the data available.,,
43,What are the first things you want to do when ...,Define the problem.,Choose the model you want to implement.,Obtain the data and check if it fits our stand...,Ask Paul what to do next.
51,Unsupervised learning ...,Is when the data we feed to our model is not l...,Allows to predict the value or the class of a ...,Allows data partitioning according to the feat...,


In [41]:
# Specify use, subject and number of MCGs
# Note Multiple subjects need to be seperated by |
use_input = quote('Total Boot Camp')
subject_unput = quote('data science|machine-learning|Docker')
mcqs_input = quote('15')

url = "http://localhost:8000/user/" + use_input + "/" + subject_unput + "/" + mcqs_input

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
pd.DataFrame(json.loads(json.loads(response.text)))

Unnamed: 0,question,responseA,responseB,responseC,responseD
44,Are every dataset worth a Data Science project?,No.,"If it's big enough, yes.",Yes.,
48,"Your model is all done and working, what's next?",My project is done!,Analyze the results and tune the existing mode...,,
47,"When building a model, you have to",Look out for parameters that can be optimized ...,Train it on all the data available.,,
43,What are the first things you want to do when ...,Define the problem.,Choose the model you want to implement.,Obtain the data and check if it fits our stand...,Ask Paul what to do next.
51,Unsupervised learning ...,Is when the data we feed to our model is not l...,Allows to predict the value or the class of a ...,Allows data partitioning according to the feat...,
45,"When the dataset is all set and obtained, what...",Run a model on it and then do a series of stat...,Explore it and do a series of statistical test...,Pre-process it by cleaning it of missing value...,
42,Its applications are ...,Limited to a small amount of fields and use ca...,Close to unlimited and find use cases in almos...,,
54,Overfitting is,When the model fits too much the training data...,When the model takes too much time to train on...,When the algorithm can't store anymore the res...,
41,Data science is ...,A set of techniques and tools used to get valu...,A scientific approach to data acquisition.,A set of empirical approaches used to define t...,
55,A way to handle imbalanced datasets is,Filtering,Undersampling,Oversampling,


Error message if too few or too many questions are asked

In [42]:
# Specify use, subject and number of MCGs
# Note Multiple subjects need to be seperated by |
use_input = quote('Total Boot Camp')
subject_unput = quote('data science')
mcqs_input = quote('3')

url = "http://localhost:8000/user/" + use_input + "/" + subject_unput + "/" + mcqs_input

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)  # => "Error message"

{"detail":"Please select MCQs of 5,10, or 15"}


In [43]:
# Specify use, subject and number of MCGs
# Note Multiple subjects need to be seperated by |
use_input = quote('Total Boot Camp')
subject_unput = quote('data science')
mcqs_input = quote('20')

url = "http://localhost:8000/user/" + use_input + "/" + subject_unput + "/" + mcqs_input

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)  # => "Error message"

{"detail":"Please select MCQs of 5,10, or 15"}


---

### Verify Answer

In [44]:
df_questions

Unnamed: 0,question,subject,use,correct,responseA,responseB,responseC,responseD,remark,question_count
0,What does No-SQL stand for?,Databases,Positioning test,A,Not OnlySQL,NoSQL,Not all SQL,,,1
1,Cassandra and HBase are databases,Databases,Positioning test,C,relational database,object-oriented,column-oriented,graph-oriented,,1
2,MongoDB and CouchDB are databases,Databases,Positioning test,B,relational database,object-oriented,column-oriented,graph-oriented,,1
3,OrientDB and Neo4J are databases,Databases,Positioning test,D,relational database,object-oriented,column-oriented,graph-oriented,,1
4,"To index textual data, I can use",Databases,Positioning test,A,ElasticSearch,Neo4J,mysql,,,1
...,...,...,...,...,...,...,...,...,...,...
71,Which Spark library does not exist?,Data Streaming,Validation test,,SparkSQL,SparkML,Spark Streaming,Spark IO,,1
72,What does RDD mean?,Data Streaming,Validation test,,Raw distributed dataset,Redundant Distributed Dataset,Resilient Distributed DataSet,,,1
73,What is DAG?,Data Streaming,Validation test,,A representation of the tasks to be performed,A device that optimizes calculations,,,,1
74,Dstreams are defined by,Data Streaming,Validation test,,A time limit,A space limit,A randomly determined limit,All these dots,,1


In [45]:
question_unput = quote('Overfitting is')
answer_input = quote('A')

url = "http://localhost:8000/answer/" + question_unput + "/" + answer_input + "/"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)

{"Answer verification":"True"}


In [46]:
question_unput = quote('Overfitting is')
answer_input = quote('B')

url = "http://localhost:8000/answer/" + question_unput + "/" + answer_input + "/"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)

{"Answer verification":"False"}


In [47]:
# No answer available
question_input = quote('Which Spark library does not exist?')
answer_input = quote('A')

url = "http://localhost:8000/answer/" + question_input + "/" + answer_input

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)

"No answer available"


---

## Admin Endpoints <a class="anchor" id="admin"></a>


### Admin Authentication

Only admin users have access to admin endpoints

In [48]:
url = "http://localhost:8000/admin"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWRtaW46NGRtMU4='
}

response = requests.get(url, headers=headers)
print(response.text)  # => "You used a valid API key."

{"username":"admin"}


In [49]:
# Users should not have access to the admin endpoints

url = "http://localhost:8000/admin"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)  # => "You used a valid API key."

{"detail":"Incorrect username or password"}


---

### Post Question

In [50]:
url = "http://localhost:8000/admin/post_quesiton"

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWRtaW46NGRtMU4=='
}

new_question = {
    "question": "What is a machine learning (ML) model pipeline?",
    "subject": "machine-learning",
    "use": "Total Boot Camp",
    "correct": "A",
    "responseA": "A technical infrastructure used to automatically manage ML processes",
    "responseB": "A technical infrastructure used to import Python packages",
    "responseC": "A technical infrastructure used to run Docker images",
    "responseD": "A pipe that Super Mario uses to get underground",
    "remark": "Example Question"
}

response = requests.post(url, json=new_question, headers=headers)
pd.DataFrame(json.loads(response.json()))
#response.json()

Unnamed: 0,question,subject,use,correct,responseA,responseB,responseC,responseD,remark
76,What is a machine learning (ML) model pipeline?,machine-learning,Total Boot Camp,A,A technical infrastructure used to automatical...,A technical infrastructure used to import Pyth...,A technical infrastructure used to run Docker ...,A pipe that Super Mario uses to get underground,Example Question


In [51]:
# Verify answer
question_input = quote('What is a machine learning (ML) model pipeline?')
answer_input = quote('A')

url = "http://localhost:8000/answer/" + question_input + "/" + answer_input

# The client should pass the API key in the headers
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic YWxpY2U6d29uZGVybGFuZA=='
}

response = requests.get(url, headers=headers)
print(response.text)

{"Answer verification":"True"}


---