# Research Question: Can We Predict Success Of  English Tutors On The Preply Platform?

### by: Guy Mizrahi and Eden Tzagai

![title](teacher_english.webp)

Preply is an online platform that facilitates global connections between students and tutors.
Selecting an appropriate tutor is of utmost importance as it guarantees a personalized learning experience, expertise in the subject matter, proficient communication, as well as motivation and support.

We have opted to focus on this particular topic due to its potential to enhance educational outcomes. By identifying the pivotal factors that contribute to effective teaching, we aim to assist students in finding the most fitting tutors and thereby improving their learning experience.

The purpose of this study is to examine the feasibility of predicting the success of English tutors on the Preply platform. The research question focuses on determining whether it is possible to forecast the level of success achieved by English tutors based on specific variables and factors within the context of the Preply platform. By investigating this research question, the study aims to contribute to the field of online education by identifying potential indicators that can be used to predict the success of English tutors specifically on the Preply platform

### Our dataset is made of the following columns:
- Name:The name of the tutor
- Country:The tutor's country of origin
- English Level:The level of English proficiency of the tutor
- Price:The specified fee for tutoring services
- Diploma:The educational degree accredited by the website
- Certificate:The certification recognized and approved by the website
- Response Time:The average time taken by the tutor to respond
- Number Of Lessons:The total number of lessons conducted by the tutor.
- Stars:The rating awarded to the tutor based on student feedback
- Reviews:The number of reviews received by the tutor.
- Popularity Score: This column is not currently available on the website but is necessary for calculating the average score derived from the combination of star ratings and reviews.

- **Popularity Score:Our target column, representing the popularity score of each tutor.**


# Data Collection:

On the Preply platform, there is a comprehensive list of online English tutors available at the following link: [link](https://preply.com/en/online/english-tutors).
Each page contains profiles of 10 tutors, with approximately 1500 pages in total.
Our Scraper systematically navigates through each page, accessing individual tutor profile cards, and extracts the necessary information.
To ensure data integrity and as a precautionary measure against potential failures, our system periodically saves the current information onto a CSV file after processing every 100 tutors.

##### Libraries import

In [1]:
from tqdm import tqdm
import pandas as pd
import requests
import numpy as np
from bs4 import BeautifulSoup
import time
import random
import csv

In [2]:
# All the variables and arrays we will need

all_links = [] # Will save the links of all profiles from the site
url = "https://preply.com/en/online/english-tutors" # The necessary link template
page_number = 1 # An iterator for traversing all pages 

#************************************************************************************#

In [None]:
# The loop runs over all the pages on the site

while tqdm(page_number != 1497):
    
    # parses list page
    current_url = f"{url}?page={page_number}"
    response = requests.get(current_url)
    print(current_url)
    print(response)
    time.sleep(random.randint(1, 4))
    soup = BeautifulSoup(response.content, 'html.parser')
    mtag = soup.findAll("li",attrs={"class":"styles_TutorCardWrapper__0Awqa"})  # creates a list of all profile links in the page
    #************************************************************************************#
    
    
    # The loop runs over all the profiles on the page we run on
    for user in tqdm(mtag):
        time.sleep(random.randint(1,4))
        link = user.find("a").get("href") # Search the links of each profile on the page
        all_links.append(link) # Adding the links of each profile from the page
    #************************************************************************************#
        
        
        
    page_number += 1
#************************************************************************************#

In [3]:
# A function that will be used by us to back up information

def save_file(data):
    df = pd.DataFrame(data,columns=['Name','Country ','English Level','Price','Diploma','Certificate','Response Time','No Of Lessons','Stars','Reviews'])
    return df

#************************************************************************************#

In [4]:
# All the variables and arrays we will need

data = [] # Will save the information of all profiles in total
fail_links = [] # Will save the links to the profiles that we were unable to extract information
data_row = [] # will save the information for the profile we are running on
counter = 0 # An iterator for the backup function

#************************************************************************************#

In [None]:
# The loop runs over each page in the list of all_links

for url in tqdm(links_list):
    
    data_row = []
    #this 'if' statement is for back up information due to failures beyond our control (such as internet connection,etc..)
    if (counter%100) == 0:
        file = save_file(data)
        file.to_csv(r'C:\Users\User\Desktop\לימודים\מדעי המחשב\מדעי הנתונים\Backup.csv')
    #************************************************************************************#
    
    
    # parses list page
    try:
        r = requests.get(url)
    except:
        fail_links.append(url)
        continue 
    soup = BeautifulSoup(r.content, 'html.parser')
    time.sleep(random.randint(1,3)) #we had problems with parsing the page so we added the pause so the page had time to load
    #************************************************************************************#
    
    
    
    
    # The information we want to extract from each link
    Name = soup.find("h2",attrs = {"class":"styles_name__hxfD2"})
    Country = soup.find("img",attrs = {"class":"styles_flag__fK4O5"})
    Level = soup.find("span",attrs = {"class":"styles_levelBadge__SmCyl"}) 
    Price = soup.find("div",attrs = {"class":"styles_PriceIndicatorValue__ndpfb"})
    Lessons = soup.find("span",attrs = {"class":"styles_totalLessons__VRT0h"}) 
    Stars = soup.find("div",attrs = {"class":"styles_RatingIndicatorRating__h_dIR"})
    Reviews = soup.find("div",attrs = {"class":"styles_ReviewsNumber__9r_a6"})
    Response = soup.find("span",attrs = {"class":"styles_ResponseTimeText__1x_eT"})
    Diploma = soup.find("span",attrs = {"class":"styles_diploma__gz7I2"})
    Certificate = soup.find("span",attrs = {"class":"styles_diploma__E7bks"})
    time.sleep(random.randint(1,3))
    #************************************************************************************#

    
    #this 'if'/'try and expet' statement is for each profile card that created a specific problem that couldnt be solved (not all variables exist in every profile, etc..)
    try:
        data_row.append(Name.text)
    except:
        data_row.append(np.nan)

    try:
        if (Country==None):
            Country = soup.find("span",attrs = {"class":"khPkeq _3pLfSK _1oG1KT"})
            data_row.append(Country.text)
        else:
            Country2 = Country['alt']
            data_row.append(Country2)
    except:
        data_row.append(np.nan)
        
    try:
        data_row.append(Level.text)
    except:
        data_row.append(np.nan)
        
    try:
        data_row.append(Price.text)
    except:
        data_row.append(np.nan)
    
    if(Lessons != None):
        data_row.append(Lessons.text)
    else: data_row.append(np.nan)
    
    if(Stars != None):
        data_row.append(Stars.text)
    else: data_row.append(np.nan)
    
    if(Reviews != None):
        data_row.append(Reviews.text)
    else: data_row.append(np.nan)
        
    if(Response != None):
        data_row.append(Response.text)
    else: data_row.append(np.nan)     
        
    if(Diploma != None):
        data_row.append(Diploma.text)
    else: data_row.append(np.nan)
    
    if(Certificate != None):
        data_row.append(Certificate.text)
    else: data_row.append(np.nan)
    #************************************************************************************#
   



    
    data.append(data_row) # Added each profile to our final database
    counter += 1 
    
#************************************************************************************#
#end of loop

In [5]:
# creating a dataframe of the tutors links that added to this database (all the fail links)
fail_links_df = pd.DataFrame(fail_links,columns=['Fail Links'])
fail_links_df 

Unnamed: 0,Fail Links


In [6]:
# creating a dataframe of the tutors.
df = pd.DataFrame(data,columns=['Name','Country ','English Level','Price','Diploma','Certificate','Response Time','No Of Lessons','Stars','Reviews'])

##### This is our raw dataset

In [7]:
df

Unnamed: 0,Name,Country,English Level,Price,Diploma,Certificate,Response Time,No Of Lessons,Stars,Reviews


In [8]:
# df.to_csv('Data Frame (After Setp 1).csv', index=False)
# df.to_excel('Data Frame (After Setp 1).xlsx', index=False)