### MOME Data Scientist Pre-Employment Assessment

Hey there, future data scholar!

Welcome to the assessment phase of our journey together. We're excited to see your analytical prowess and creative flair in action. Below, you'll find the lowdown on the dataset and the task at hand. Let's dive in and show us what you've got!

#### Dataset Details:

**Source of Data:**  
The dataset originates from a popular website '\x92\x9e\x9e\x9a\x9ddYY¡¡¡X\x97\x8b\x9c\x95\x97£\x9a\x9c\x99\x90\x8f\x9d\x9d\x99\x9cX\x8d\x99\x97' *(undisclosed, classified, redacted)* where university students can evaluate their teachers based on various criteria such as achievability, subject usefulness, kindness, preparedness, and presentation. These evaluations could provide valuable feedback to lecturers, administrators or students to improve the world of education.

**Teachers DataFrame:**  
- Dimensions: (32590, 20)  
- Peek into the lives of over 32,000 teachers! Columns include everything from names to department details.

**Criterias DataFrame:**  
- Dimensions: (119605, 6)  
- Criteria galore! This dataframe is all about averaged ratings and timestamps, giving you a glimpse into how teachers fare.

**Ratings DataFrame:**  
- Dimensions: (325554, 16)  
- It's raining ratings! Dive into a treasure trove of student feedback, complete with comments, dates and other. Get ready to uncover insights like never before.

**Criteria Dictionary:**  
- Your trusty map to decoding criteria IDs. From "requirements difficulty" to "presentation style," this dictionary has got you covered.

#### Candidate Task:

Your mission, should you choose to accept it, is to harness the power of this dataset to craft a reporting, analytic or predictive application for the website. We're talking about turning raw data into actionable insights that'll make waves in the education world. Here are some avenues you might explore:

1. **Automatic Description Generation for Teachers:** Let's get creative! Develop a model or algorithm that crafts juicy summaries for teachers, highlighting their strengths and quirks.

2. **Predictive Modeling for Teacher Grading:** Become a fortune-teller of sorts by building a model that predicts teacher grades based on student feedback. Time to put those machine learning chops to work!

3. **Engineering Teacher Average Grades:** Crack the code of teacher grades. Whether it's aggregating ratings or reverse engineering averages, show us your wizardry with numbers.

4. **Extending Analysis to Multiple Levels:** Think big! Dive deep into departments, subjects, locations, and time series. Hunt down trends, anomalies, and stories waiting to be told.

5. **Data Gathering Process Improvement:** Let's fine-tune our data game. Share your insights on how we can level up our data gathering process and suggest additional datasets for an even richer analysis.

6. **Proof of Concepts:** It's showtime! Whip up some snazzy prototypes or dashboards to demonstrate the magic of your pipeline or application.

Your submission is your canvas, so paint it with your problem-solving brilliance, analytical finesse, and a sprinkle of your unique style. We're not just looking for results; we're looking for the spark that sets you apart.

> "In the garden of life, the flowers of wisdom bloom not at the journey's end, but along the paths we tread with mindful steps."  
> — Edsger W. Dijkstra, after a yoga session


#### Submission Guidelines:

- No rulez. But we advise a [google.colab](https://colab.google/) notebook environment. Easy to set up and free for all!
- Pour your heart out in a written report detailing your approach, findings, and recommendations, and feel free to jazz it up with code snippets, visualizations, or any other bells and whistles that bring your work to life.
- Don't hold back! Share any additional insights, ideas, or musings that you think might tickle our fancy.

Ready to dazzle us? Go forth and conquer! If you've got any burning questions, don't hesitate to give us a shout.

Best,

**MOME DATA**


Importing

In [1]:
import pandas as pd
import numpy as np
np.random.seed(42)

Loading data and just a general rundown

In [2]:
loadpath = "..."

# Load the dataframes
df_teachers = pd.read_parquet(f"{loadpath}/teachers.parquet")
df_criterias = pd.read_parquet(f"{loadpath}/criterias.parquet")
df_ratings = pd.read_parquet(f"{loadpath}/ratings.parquet")

#Additional dictionary for criteria definitions
criteria_dict = {11:"requirements_difficulty",12:"subject_usefulness",13:"teacher_helpfulness",14:"teacher_readiness",15:"presentation_style"}


# Print the dimensions and general properties of each dataframe
print("Teachers DataFrame:")
print("Dimensions:", df_teachers.shape)
print(df_teachers.info())
print("\nCriterias DataFrame:")
print("Dimensions:", df_criterias.shape)
print(df_criterias.info())
print("\nRatings DataFrame:")
print("Dimensions:", df_ratings.shape)
print(df_ratings.info())

Teachers DataFrame:
Dimensions: (32590, 20)
<class 'pandas.core.frame.DataFrame'>
Index: 32590 entries, 0 to 51643
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   teacher_id                  32590 non-null  int64  
 1   data_first_name             31970 non-null  object 
 2   data_last_name              31970 non-null  object 
 3   data_department_id          28921 non-null  float64
 4   data_department_name        28921 non-null  object 
 5   data_department_school_id   20093 non-null  float64
 6   data_department_created_at  28921 non-null  object 
 7   data_department_updated_at  28921 non-null  object 
 8   data_department_short_name  28921 non-null  object 
 9   data_department_website     28918 non-null  object 
 10  data_department_city        28918 non-null  object 
 11  data_department_region      28839 non-null  object 
 12  data_department_twitter     6729 non-null   objec

In [3]:
df_teachers.sample(10)

Unnamed: 0,teacher_id,data_first_name,data_last_name,data_department_id,data_department_name,data_department_school_id,data_department_created_at,data_department_updated_at,data_department_short_name,data_department_website,data_department_city,data_department_region,data_department_twitter,data_department_active,data_department_old_id,data_department_slug,data_school_url,data_slug,data_rate_restriction,data_department
14950,6774,Nagy,Attila,39.0,Debreceni Egyetem Népegészségügyi Kar,,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,DE-NK,www.unideb.hu,Debrecen,Hajdú-Bihar,,1.0,43.0,debreceni-egyetem-nepegeszsegugyi-kar-39,,nagy-attila-6774,anyone,
27559,23317,Benczik,Vera,44.0,Eötvös Loránd Tudományegyetem Bölcsészettudomá...,70.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,ELTE-BTK,www.elte.hu,Budapest,Budapest,,1.0,52.0,eotvos-lorand-tudomanyegyetem-bolcseszettudoma...,,benczik-vera-23317,anyone,
7301,10912,Hollán,Miklós,58.0,Károli Gáspár Református Egyetem Állam- és Jog...,,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,KRE-ÁJK,www.kre.hu,Budapest,Budapest,,1.0,72.0,karoli-gaspar-reformatus-egyetem-allam-es-jogt...,,hollan-miklos-10912,anyone,
36384,29855,Nagy,Péter,110.0,Pécsi Tudományegyetem Bölcsészettudományi Kar,101.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,PTE-BTK,www.pte.hu,Pécs,Baranya,,1.0,134.0,pecsi-tudomanyegyetem-bolcseszettudomanyi-kar-110,,nagy-peter-pecsi-tudomanyegyetem-29855,anyone,
41446,30257,Dr.,Bölcskei Attila,12.0,Szent István Egyetem Tájépítészeti és Települé...,96.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,SZIE TÁJK,www.tajk.szie.hu,Budapest,Budapest,,1.0,14.0,szent-istvan-egyetem-tajepiteszeti-es-telepule...,,dr-bolcskei-attila-30257,anyone,
3660,9222,Prónai,Csaba,48.0,Eötvös Loránd Tudományegyetem Társadalomtudomá...,70.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,ELTE-TÁTK,www.elte.hu,Budapest,Budapest,,1.0,56.0,eotvos-lorand-tudomanyegyetem-tarsadalomtudoma...,,pronai-csaba-9222,anyone,
48631,23500,Scheibl,György,130.0,Szegedi Tudományegyetem Bölcsészettudományi Kar,91.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,SZTE-BTK,www.u-szeged.hu,Szeged,Csongrád,,1.0,160.0,szegedi-tudomanyegyetem-bolcseszettudomanyi-ka...,,scheibl-gyorgy-23500,anyone,
20222,7473,Tóth,Judit,50.0,Eszterházy Károly Katolikus Egyetem Bölcsészet...,73.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,EKKE-BMK,https://uni-eszterhazy.hu/bmk,Eger,Heves,,1.0,59.0,eszterhazy-karoly-katolikus-egyetem-bolcseszet...,,toth-judit-7473,anyone,
6422,3909,Balogh,Andrásné,21.0,Budapesti Műszaki és Gazdaságtudományi Egyetem...,69.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,BME-GTK,www.bme.hu,Budapest,Budapest,,1.0,25.0,budapesti-muszaki-es-gazdasagtudomanyi-egyetem...,,balogh-andrasne-budapest-muszaki-egyetem-3909,anyone,
36893,28623,Csima,Ferenc,56.0,Kaposvári Egyetem Gazdaságtudományi Kar,96.0,2010-04-08T15:32:41.000000Z,2023-09-15T17:41:40.000000Z,KE-GTK,www.u-kaposvar.hu,Kaposvár,Somogy,,1.0,70.0,kaposvari-egyetem-gazdasagtudomanyi-kar-56,,csima-ferenc-28623,anyone,


In [4]:
df_criterias.sample(10)

Unnamed: 0,teacher_criteria_id,rating_criteria_id,teacher_id,rating,created_at,updated_at
135869,286322,15,23710,2.26,,2023-09-14T19:20:35.000000Z
9918,269751,14,7650,4.99,,2023-09-14T19:20:35.000000Z
45591,381100,12,21749,2.8,,2023-09-17T11:02:19.000000Z
27002,280765,13,18563,2.78,,2023-09-14T19:20:35.000000Z
38825,283678,11,21743,5.0,,2023-09-14T19:20:35.000000Z
2174,283202,15,21375,4.15,,2023-09-14T19:20:35.000000Z
16089,264752,15,3078,4.65,,2023-09-14T19:20:35.000000Z
51609,281552,15,19273,4.74,,2023-09-14T19:20:35.000000Z
45106,340815,12,5580,3.08,,2023-09-17T11:02:19.000000Z
19697,375706,13,19565,4.89,,2023-09-17T11:02:19.000000Z


In [5]:
df_ratings.sample(10)

Unnamed: 0,rating_id,show_comment,comment,up_votes,down_votes,verified,date,user_vote,teacher_id,subject_id,subject_name,rating_criteria_id_11,rating_criteria_id_12,rating_criteria_id_13,rating_criteria_id_14,rating_criteria_id_15
91986,288258,True,Ekkora sötét gyökér féleszű lenéző barmot...Il...,0,0,False,2021-01-30 02:49,,8192,1262,Irodalomelmélet 1.,1,1,1,1,1
387480,339730,True,Szakmailag és emberileg egyaránt remek!,0,0,True,2018-08-18 12:32,,26326,23287,Informatika és a viág,5,5,5,5,5
214031,188805,True,"Az egyik legmegbízhatóbb tudású tanár, főleg a...",0,0,False,2015-06-26 12:22,,8599,18885,Francia szövegelemzési technikák,5,5,5,5,4
255546,136261,True,"Az egyik legnagyobb arc, rengeteget tesz hozzá...",0,0,False,2013-09-05 09:07,,23458,14571,A szerző alakjai,5,5,4,4,5
318487,320996,True,"Beugrón nagyon apró hibát elfogad, de amúgy na...",0,0,True,2019-03-06 11:51,,15478,461,büntetőjog,2,5,1,4,2
255641,126034,True,,0,0,False,2013-04-19 02:26,,25917,2859,Sejtbiológia,4,3,5,3,4
288218,27088,True,jó tanár,0,0,False,2010-07-19 05:05,,21669,318,programozás,5,5,5,5,5
227424,183121,True,"Amikor már azt érzed az első két félév után, h...",0,0,False,2015-05-05 03:34,,23067,395,élettan,5,5,5,5,5
261555,213048,True,Alig tanít valamit. Folyton késik az előadások...,0,0,False,2016-06-13 04:13,,14583,13692,Önszerveződő Alacsony Dimenziós Rendszerek,2,1,1,1,1
337922,41364,True,blööööö,0,0,False,2011-01-16 05:21,,12431,77,Analízis,2,2,2,5,1
