# Competitive Landscape SQL Project

You were hired by Ironhack to perform an Analytics Consulting Project entitled: competitive landscape.

Your mission is to create and populate an appropriate database with many coding schools that are our competition, as well as design an suitable queries that answer business questions of interest (to be defined by you)

Bonus: How will this datamodel be updated in the future? Please write auxiliary functions that test the database for data quality issues. For example: how could you make sure you only include the most recent comments when you re-run the script?

Crucial hint: check out the following tutorial: https://www.dataquest.io/blog/sql-insert-tutorial/


schools = {   
'ironhack' : 10828,
'app-academy' : 10525,
'springboard' : 11035,
'le-wagon' : 10868,
'codeworks' : 10650,
}

# Our Queries

In [1]:
import mysql.connector

In [None]:
cnx = mysql.connector.connect(user = "root", password = input('password:'),host="localhost",database="competitive_landscape")

In [3]:
cnx.is_connected()

True

In [4]:
cursor = cnx.cursor()
cursor

<mysql.connector.cursor.MySQLCursor at 0x1c65432c6a0>

In [5]:
def queries_execute(x):
    cursor.execute(x)
    queries_result = cursor.fetchall()
    return queries_result

In [6]:
import pandas as pd

# First Question:
Does jobSupport has a great influence in overallScore? How different is the evaluation from the students of bootcamp/full-time courses and part-time/online courses?

QUESTION: How many reviews for each school? 

In [7]:
reviews_school = queries_execute("""SELECT school, COUNT(comment_id)
FROM competitive_landscape.comments
GROUP BY school;""")

pd.DataFrame(reviews_school,columns=['school','total reviews'])

Unnamed: 0,school,total reviews
0,app-academy,1046
1,ironhack,1045
2,le-wagon,1966
3,springboard,914


> Answer: There are 1966 reviews for Le-wagon, 1046 reviews for App academy, 1045 for Ironhack, 914 for Springboard and 93 for codeworks.

QUESTION: Are there evaluations from 1-5 or just some of the intervals? 

Ironhack:

In [8]:
ironhack_reviews = queries_execute("""SELECT DISTINCT jobSupport
FROM competitive_landscape.comments
WHERE school_id = '10828'
ORDER BY jobSupport ASC;""")

pd.DataFrame(ironhack_reviews,columns=['range_reviews'])

Unnamed: 0,range_reviews
0,
1,1.0
2,2.0
3,3.0
4,4.0
5,5.0


App Academy:

In [9]:
appacademy_reviews = queries_execute("""SELECT DISTINCT jobSupport
FROM competitive_landscape.comments
WHERE school_id = '10525'
ORDER BY jobSupport ASC;""")

pd.DataFrame(appacademy_reviews,columns=['range_reviews'])

Unnamed: 0,range_reviews
0,
1,1.0
2,2.0
3,3.0
4,4.0
5,5.0


Springboard:

In [10]:
springboard_reviews = queries_execute("""SELECT DISTINCT jobSupport
FROM competitive_landscape.comments
WHERE school_id = '11035'
ORDER BY jobSupport ASC;""")

pd.DataFrame(springboard_reviews,columns=['range_reviews'])

Unnamed: 0,range_reviews
0,
1,1.0
2,2.0
3,3.0
4,4.0
5,5.0


Le-wagon:

In [11]:
wagon_reviews = queries_execute("""SELECT DISTINCT jobSupport
FROM competitive_landscape.comments
WHERE school_id = '10868'
ORDER BY jobSupport ASC;""")

pd.DataFrame(wagon_reviews,columns=['range_reviews'])

Unnamed: 0,range_reviews
0,
1,1.0
2,2.0
3,3.0
4,4.0
5,5.0


> Answer: All the schools have reviews regarding job support from 1 to 5. 

Let's look at the overall score

Ironhack:

In [12]:
ironhack_ovscore = queries_execute("""SELECT overallScore, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '10828'
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(ironhack_ovscore,columns=['overall score','reviews'])

Unnamed: 0,overall score,reviews
0,,5
1,1.0,1
2,1.3,1
3,1.7,2
4,2.3,3
5,2.7,5
6,3.0,5
7,3.3,6
8,3.7,7
9,4.0,34


> Answer: Most of Ironhack reviews regarding overall score are located between 4.7 and 5.0

App Academy:

In [13]:
appacademy_ovscore = queries_execute("""SELECT overallScore, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '10525'
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(appacademy_ovscore,columns=['overall score','reviews'])

Unnamed: 0,overall score,reviews
0,,3
1,1.0,3
2,1.3,1
3,1.7,1
4,2.0,2
5,2.3,3
6,2.5,2
7,2.7,5
8,3.0,8
9,3.3,14


> Answer: Most of App academy reviews regarding overall score are located between 4.7 and 5.0, with a smaller peak between 4.0 - 4.0.

Springboard:

In [14]:
springboard_ovscore = queries_execute("""SELECT overallScore, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '11035'
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(springboard_ovscore,columns=['overall score','reviews'])

Unnamed: 0,overall score,reviews
0,1.0,7
1,1.3,3
2,1.5,1
3,1.7,4
4,2.0,3
5,2.3,1
6,2.7,4
7,3.0,9
8,3.3,9
9,3.5,2


> Answer: Most of Springboard reviews regarding overall score are located between 4.7 and 5.0, with a smaller peak at 4.0.

Le-Wagon:

In [15]:
wagon_ovscore = queries_execute("""SELECT overallScore, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '10868'
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(wagon_ovscore,columns=['overall score','reviews'])

Unnamed: 0,overall score,reviews
0,,3
1,1.0,3
2,2.3,2
3,2.7,1
4,3.0,1
5,3.3,3
6,3.5,1
7,3.7,2
8,4.0,4
9,4.3,39


> Answer: Most of Le-wagon reviews regarding overall score are located between 4.7 and 5.0.

QUESTION: How many evaluations in each review from 1-5?

Ironhack:

In [16]:
iron_jobsup = queries_execute("""SELECT jobSupport, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '10828'
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(iron_jobsup,columns=['job support','reviews'])

Unnamed: 0,job support,reviews
0,,46
1,1.0,8
2,2.0,7
3,3.0,16
4,4.0,106
5,5.0,862


> Answer: Ironhack has 862 reviews which gave 5 in jobSupport; 106 - 4; 16 - 3; 7 - 2 and 8 reviews with 1 in jobSupport (also 46 null) 

App academy:

In [17]:
appacad_jobsup = queries_execute("""SELECT jobSupport, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '10525'
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(appacad_jobsup,columns=['job support','reviews'])

Unnamed: 0,job support,reviews
0,,103
1,1.0,11
2,2.0,11
3,3.0,78
4,4.0,259
5,5.0,584


> Answer: App academy has 584 reviews with 5 in jobSupport; 259 - 4; 78 - 3; 11 - 2 and 11 reviews with 1 in jobSupport (also 102 null) 

Springboard:

In [18]:
spring_jobsup = queries_execute("""SELECT jobSupport, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '11035'
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(spring_jobsup,columns=['job support','reviews'])

Unnamed: 0,job support,reviews
0,,110
1,1.0,25
2,2.0,6
3,3.0,71
4,4.0,198
5,5.0,504


> Answer: Springboard has 500 reviews with 5 in jobSupport; 198 - 4; 71 - 3; 6 - 2 and 25 reviews with 1 in jobSupport (also 110 null) 

Le-wagon:

In [19]:
wagon_jobsup = queries_execute("""SELECT jobSupport, COUNT(comment_id)
FROM competitive_landscape.comments
WHERE school_id = '10868'
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(wagon_jobsup,columns=['job support','reviews'])

Unnamed: 0,job support,reviews
0,,170
1,1.0,7
2,2.0,2
3,3.0,26
4,4.0,213
5,5.0,1548


> Answer: Le-wagon has 1548 reviews with 5 in jobSupport; 213 - 4; 26 - 3; 2 - 2 and 7 reviews with 1 in jobSupport (also 170 null)

QUESTION: Does jobSupport has a great influence in overallScore? 

Relation between jobSupport and program at Ironhack

Ironhack - Data Analytics:

In [20]:
iron_data = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Data Analytics Bootcamp') | (program = 'Data Analytics Part-Time'))) 
GROUP BY jobSupport 
ORDER BY jobSupport ASC;""")

pd.DataFrame(iron_data,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,4.0,3.0,3
1,4.0,4.0,8
2,5.0,5.0,23


Ironhack - UX/UI:

In [21]:
iron_ui_ux = queries_execute("""SELECT jobSupport,overall,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'UX/UI Design Bootcamp') | (program = 'UX/UI Design Part-Time'))) 
GROUP BY jobSupport 
ORDER BY jobSupport ASC;""")

pd.DataFrame(iron_ui_ux,columns=['job support','overall','reviews'])

Unnamed: 0,job support,overall,reviews
0,,4.0,1
1,1.0,3.0,3
2,2.0,2.0,1
3,3.0,5.0,5
4,4.0,5.0,13
5,5.0,5.0,54


Ironhack - Web Development:

In [22]:
iron_web = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Web Development Bootcamp') | (program = 'Web Development Part-Time'))) 
GROUP BY jobSupport 
ORDER BY jobSupport ASC;""")

pd.DataFrame(iron_web,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,1
1,3.0,1.0,2
2,4.0,2.0,2
3,5.0,4.0,22
4,5.0,5.0,125


> Answer: In general there isn't a great connection between jobSupport and overall reviews. That's the case of Data Analytics. However in UX/UI there are cases of a lower jobSupport with an higher overallScore. Same for Web Development 

Relation between jobSupport and program at app academy

App Academy Software Engineer

In [23]:
app_job = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'app-academy') & ((program = 'Software Engineer Track: In-Person'))) 
GROUP BY jobSupport 
ORDER BY jobSupport ASC;""")

pd.DataFrame(app_job,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,65
1,3.0,1.0,8
2,4.0,2.0,9
3,5.0,3.0,58
4,5.0,4.0,202
5,5.0,5.0,419


In [24]:
app_job_on = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'app-academy') & ((program = 'Software Engineer Track: Online'))) 
GROUP BY jobSupport 
ORDER BY jobSupport ASC;""")

pd.DataFrame(app_job_on,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,2
1,1.0,1.0,1
2,4.0,3.0,7
3,4.0,4.0,27
4,5.0,5.0,67


> Answer: Since App Academy only have courses associated with software engineer, we will only focus on them. In Software Engineer Track: In-Person, there are some differences between jobSupport and overall reviews. For example, there are cases of jobSupport 3 with an overall review of 5. Another example, there are cases jobSupport of 2 with an overall of 4.
In Software Engineer Track: Online, there are fewer differences, except for the situation of jobsupport 3 with an overall of 4.
It is important to mention that there are much more students who attendend variant "track: in-person" than "track: online" 

Relation between jobSupport and program at Springboard

Springboard - Data Analytics:

In [25]:
spring_data = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Business Analytics')|(program = 'Data Analytics Career Track'))) 
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(spring_data,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,5
1,1.0,1.0,4
2,4.0,3.0,6
3,4.0,4.0,30
4,5.0,5.0,56


Springboard - Cybersecurity:

In [26]:
spring_cyber = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Cybersecurity Career Track')|(program = 'Introduction to Cybersecurity')))  
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(spring_cyber,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,5.0,2


Springboard - Data Science/Machine Learning:

In [27]:
spring_ml = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Data Science Career Track')|(program = 'Data Science Career Track Prep')|(program = 'Intermediate Data Science')|(program = 'Introduction to Data Science')|(program = 'Machine Learning Engineering Career Track')))
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(spring_ml,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,32
1,4.0,1.0,8
2,5.0,2.0,3
3,5.0,3.0,18
4,5.0,4.0,65
5,5.0,5.0,159


Springboard - Digital Marketing:

In [28]:
spring_digital = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Digital Marketing Career Track')|(program = 'Digital Marketing for Professionals')))
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(spring_digital,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,4.0,,5
1,5.0,3.0,3
2,5.0,4.0,3
3,5.0,5.0,11


Springboard - Software Engineering:

In [29]:
spring_soft = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Software Engineering Career Track')|(program = 'Software Engineering Career Track Prep Course')))
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(spring_soft,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,2.0,,1
1,1.0,1.0,1
2,2.0,2.0,1
3,5.0,4.0,2
4,5.0,5.0,20


Springboard - UI/UX Design:

In [30]:
spring_uiux = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'UI/UX Design Career Track')|(program = 'UX Career Track')|(program = 'UX Design')))
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(spring_uiux,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,45
1,4.0,1.0,6
2,4.0,2.0,1
3,5.0,3.0,34
4,5.0,4.0,76
5,5.0,5.0,180


> Answers: On Data Analytics at Springboard, there is only a difference when overall score is higher than 3 and jobSupport is only 3. Nothing to consider in Cybersecurity. On Data Science/Machine Learning, however, there are several situations of an overall higher than the jobSupport ratings, for example: there are cases of an overallScore of 4 with a job support of 1.
On Digital Marketing, there are also differences, for example cases of an overall of 5 with a jobSupport of 3. Nothing to consider in Software Engineering. On UI/UX, also significant differences, for example case of an overall of 4.0 with a jobsupport of 4.0.

Relation between jobSupport and program at Le-wagon:

Le-wagon - FullStack program:

In [31]:
wagon_stack = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & ((program = 'FullStack program - 35+ locations')|(program = 'FullStack program')))
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(wagon_stack,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,98
1,3.0,1.0,3
2,4.0,2.0,2
3,5.0,3.0,14
4,5.0,4.0,157
5,5.0,5.0,1070


Le-wagon - Web Development:

In [32]:
wagon_web = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & ((program = 'Web Development Course - Part-Time')|(program = 'Web Development Course - Full-Time')))
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(wagon_web,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,,5
1,4.0,1.0,2
2,5.0,3.0,8
3,5.0,4.0,36
4,5.0,5.0,294


Le-wagon - FullStack program:

In [33]:
wagon_full = queries_execute("""SELECT overall,jobSupport,COUNT(comment_id) 
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & ((program = 'Data Science - Full-Time')))
GROUP BY jobSupport
ORDER BY jobSupport ASC;""")

pd.DataFrame(wagon_full,columns=['overall','job support','reviews'])

Unnamed: 0,overall,job support,reviews
0,5.0,3.0,1
1,5.0,4.0,3
2,5.0,5.0,32


> Answers: Regarding FullStack program there are some situations of job support of 3 and overall of 5. Regarding Web Development there are some situations of jobsupport of 1 and overall of 4 or jobsupport of 3 and overall of 5. Regarding FullStack program there are one situation of jobsupport of 3 and overall 5. There are no other significant differences.

QUESTION: How different is the evaluation from the students of bootcamp/full-time courses and part-time/online courses?

Difference between ratings online vs campus. The case of App Academy

In [34]:
app_in_person = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'app-academy') & ((program = 'Software Engineer Track: In-Person') & (overallScore < '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(app_in_person,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,1.0,1.0,1.0,1.0,1
1,1.3,1.0,2.0,1.0,1
2,1.7,1.0,3.0,1.0,1
3,2.0,2.0,3.0,1.0,1
4,2.3,2.0,3.0,2.0,1
5,2.7,3.0,4.0,1.0,5


In [35]:
app_online = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'app-academy') & ((program = 'Software Engineer Track: Online') & (overallScore < '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(app_online,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,1.0,1.0,1.0,1.0,1
1,2.3,2.0,2.0,3.0,1


> Answers: Like we've seen earlier before, there were more reviews regarding "Software Engineer Track: In-Person" than "Online"
Considering only the the reviews with an overallScore under 3 - negative reviews - curriculum has better reviews on "In-Person" than "Online". jobSupport is negative on all the reviews on "In-Track", only 1 is positive on "Online" 

Difference between ratings bootcamp vs part-time. The case of Ironhack

Data Analytics Bootcamp vs Part-Time:

In [36]:
iron_boot = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Data Analytics Bootcamp') & (overallScore >= '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(iron_boot,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,3.3,4.0,3.0,3.0,1
1,3.7,4.0,4.0,3.0,2
2,4.0,4.0,4.0,4.0,2
3,4.3,5.0,4.0,4.0,2
4,4.7,5.0,4.0,5.0,6
5,5.0,5.0,5.0,5.0,19


In [37]:
iron_part = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Data Analytics Part-Time') & (overallScore >= '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(iron_part,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,5.0,5.0,5.0,5.0,1


> Answers: In general, there are much fewer reviews on Part-Time. Closing the overallScore to show only the negatives (& overallScore < '3'), There is only one review under 3 in Data Analytics Bootcamp and none in Part-Time. The only negative review in Bootcamp gave 1 in curriculum, but 4 in jobSupport. If we look at the overallScore to show only positives (& overallScore >= '3'), There are 32 reviews higher than 3 in Data Analytics Bootcamp and one in Part-Time. The only positive review in Part-Time gave 5 to everything

UX/UI Design Bootcamp vs Part-Time:

In [38]:
iron_ux_boot = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'UX/UI Design Bootcamp') & (overallScore >= '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(iron_ux_boot,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,3.3,2.0,5.0,3.0,1
1,3.7,4.0,4.0,3.0,2
2,4.0,4.0,4.0,4.0,5
3,4.3,5.0,4.0,4.0,5
4,4.7,5.0,4.0,5.0,7
5,5.0,5.0,5.0,5.0,38


In [39]:
iron_ux_part = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'UX/UI Design Part-Time') & (overallScore >= '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(iron_ux_part,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,4.0,4.0,4.0,4.0,1
1,4.3,5.0,5.0,3.0,3
2,4.7,4.0,5.0,5.0,5
3,5.0,5.0,5.0,5.0,6


> Answers: In general, there are much fewer reviews on Part-Time. Closing the overallScore to show only the negatives (& overallScore < '3'), There are only TWO reviews under 3 in UX/UI Design Bootcamp and ONE in Part-Time. The only review under 3 in Part Time gave 3 in curricullum, but 1 in jobSupport. One of the reviews under 3 in bootcamp gave 3 in overall and curriculum but 1 in jobSupport. If we look at the overallScore to show only positive reviews (& overallScore >= '3'). There is only one review on bootcamp wich has a negative overall. The rest of the data shows there isn't much difference between bootcamp and part-time 

Web Development Bootcamp vs Part-Time

In [40]:
iron_web_boot = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Web Development Bootcamp') & (overallScore >= '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(iron_web_boot,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,3.0,4.0,3.0,2.0,2
1,4.0,3.0,4.0,5.0,2
2,4.3,5.0,4.0,4.0,11
3,4.7,5.0,4.0,5.0,17
4,5.0,5.0,5.0,5.0,101


In [41]:
iron_web_part = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Web Development Part-Time') & (overallScore >= '3'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(iron_web_part,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,4.3,4.0,4.0,5.0,2
1,4.7,4.0,5.0,5.0,3
2,5.0,5.0,5.0,5.0,12


> Answers: In general, there are much fewer reviews on Part-Time. Closing the overallScore to show only the negatives (& overallScore < '3'), there is only ONE review on BOTH 'Bootcamp' and 'Part-Time'. The only negative review on 'part-time' gave 1 to every single paramether. The only negative review on 'bootcamp' gave 4 in curriculum but 1 in jobSupport. If we look at the overallScore to show only positive reviews (& overallScore >= '3'), there are FIVE related to 'Bootcamp' and THREE on 'Part-Time'. The reviews on 'Part-Time' are higher than 'Bootcamp'. For example, on Part-Time, none of the reviews gave less than 4 in every paramether.

Difference between ratings fulltime vs part-time. The case of Le-wagon

Web Development Full-Time vs Part-Time:

In [42]:
wagon_full = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & ((program = 'Web Development Course - Full-Time'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(wagon_full,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,1.0,1.0,1.0,1.0,1
1,2.7,4.0,3.0,1.0,1
2,3.7,4.0,4.0,3.0,1
3,4.0,5.0,4.0,3.0,1
4,4.3,5.0,5.0,3.0,9
5,4.7,5.0,5.0,4.0,38
6,5.0,5.0,5.0,5.0,254


In [43]:
wagon_part = queries_execute("""SELECT overallScore,overall,curriculum,jobSupport,COUNT(comment_id)
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & ((program = 'Web Development Course - Part-Time'))) 
GROUP BY overallScore
ORDER BY overallScore ASC;""")

pd.DataFrame(wagon_part,columns=['overall score','overall','curriculum','job support','reviews'])

Unnamed: 0,overall score,overall,curriculum,job support,reviews
0,4.3,5.0,5.0,3.0,1
1,4.7,5.0,5.0,4.0,3
2,5.0,5.0,5.0,5.0,36


> Answers: In general, there are much fewer reviews on Part-Time. Closing the overallScore to show only the negatives (& overallScore < '3'), there is only two reviews in fulltime under 3. The first negative review gave 1 to every single parameter. The second negative review gave 4 overall but 1 in jobSupport. If we look at the overallScore to show only positive reviews (& overallScore >= '3'), most of the reviews are associated with 5 score in all parameters in both Full-time and Part-time.

All in all there are insuficcient elements to elaborate a study about the evaluation between bootcamp and part-time. But it is notable that according to the reviews, there are more students attending bootcamp, and less attending part-time.

# Second Question: 
There is any relation between graduating year and review scores?

Let´s analyse IRONHACK first

Which programms are reviewed in the comments:

In [44]:
iron_prog = queries_execute("""SELECT DISTINCT program 
FROM competitive_landscape.comments 
WHERE school = 'ironhack' ORDER BY program ASC;""")
iron_prog_df = pd.DataFrame(iron_prog,columns=['Ironhack programms reviewed in comments'])
iron_prog_df = iron_prog_df.drop(labels=[0,1], axis=0)
iron_prog_df

Unnamed: 0,Ironhack programms reviewed in comments
2,Data Analytics Bootcamp
3,Data Analytics Part-Time
4,Full-time UX/UI Design Bootcamp
5,Full-time Web Development Bootcamp
6,Full-time Web Development Bootcamp
7,Part-time UX/UI Design
8,Part-time UX/UI Design
9,Part-time Web Development
10,UX/UI Design Bootcamp
11,UX/UI Design Part-Time


Now lets evaluate the number of reviews by year associated with Ironhack:

In [45]:
iron_total_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),
AVG(overall),AVG(curriculum),AVG(jobSupport)
FROM competitive_landscape.comments 
WHERE school = 'ironhack' GROUP BY graduatingYear ORDER BY graduatingYear ASC;""")

pd.DataFrame(iron_total_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2011.0,1,4.7,5.0,4.0,5.0
1,2014.0,9,4.711111,4.777778,4.666667,4.666667
2,2015.0,32,4.85,4.928571,4.722222,4.888889
3,2016.0,75,4.92973,4.945946,4.896552,4.896552
4,2017.0,198,4.871717,4.914141,4.805128,4.901042
5,2018.0,309,4.871197,4.906149,4.820847,4.887789
6,2019.0,217,4.765438,4.801843,4.700461,4.796296
7,2020.0,184,4.728804,4.820652,4.777174,4.576923
8,2021.0,20,4.815,4.95,4.8,4.7


> The year with more reviews is 2018 followed by 2019.

> The overall score average is higher in 2016. The overall score average does not increase over time.

> The overall average is higher in 2011. The overall average does not increase over time.

> The curriculum average is higher in 2016. The curriculum average does not increase over time.

> The jobSupport average is higher in 2017. The jobSupport average does not increase over time.


Lets evaluate by course area: 

DATA ANALYTICS:

In [46]:
iron_reviews_data = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Data Analytics Bootcamp') | (program = 'Data Analytics Part-Time'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(iron_reviews_data,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2019.0,8,4.4125,4.5,4.125,4.625
1,2020.0,26,4.711538,4.807692,4.730769,4.576923


> The year with more reviews is 2020. We only have comments associated with 2 different graduating years. 

> The overall score average is higher in 2020. The overall score average increase over time.

> The overall average is higher in 2020. The overall average increase over time.

> The curriculum average is higher in 2020. The curriculum average increase over time.

> The jobSupport average is higher in 2019. The jobSupport average decrease over time.

UX/UI DESIGN:

In [47]:
iron_reviews_ux = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments
WHERE (school = 'ironhack') & ((program = 'Full-time UX/UI Design Bootcamp') | (program = 'Part-time UX/UI Design') | (program = 'Part-time UX/UI Design ') | (program = 'UX/UI Design Bootcamp') | (program = 'UX/UI Design Part-Time')) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")


pd.DataFrame(iron_reviews_ux,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])


Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2011.0,1,4.7,5.0,4.0,5.0
1,2016.0,7,4.957143,4.857143,5.0,5.0
2,2017.0,58,4.818966,4.896552,4.719298,4.839286
3,2018.0,85,4.778824,4.823529,4.72619,4.795181
4,2019.0,59,4.652542,4.745763,4.59322,4.610169
5,2020.0,40,4.5625,4.575,4.675,4.435897
6,2021.0,7,4.614286,4.857143,4.571429,4.428571


> The year with more reviews is 2018.

> The overall score average is higher in 2016. The overall score average decrease over time.

> The overall average is higher in 2011 (only 1 review). The overall average does not increase over time.

> The curriculum average is higher in 2016. The curriculum average decrease over time.

> The jobSupport average is higher in 2011 and 2016. The jobSupport average decrease over time.

WEB DEVELOPMENT:

In [48]:
iron_web_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE (school = 'ironhack') & ((program = 'Full-time Web Development Bootcamp') | (program = 'Full-time Web Development Bootcamp ') |  (program = 'Part-time Web Development') | (program = 'Web Design') |  (program = 'Web Development Bootcamp') | (program = 'Web Development Part-Time'))
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(iron_web_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2014.0,8,4.7125,4.75,4.75,4.625
1,2015.0,31,4.844444,4.925926,4.705882,4.882353
2,2016.0,67,4.925758,4.954545,4.882353,4.882353
3,2017.0,127,4.900787,4.92126,4.848,4.943089
4,2018.0,207,4.912077,4.937198,4.859223,4.936275
5,2019.0,120,4.849167,4.866667,4.783333,4.891667
6,2020.0,93,4.852688,4.935484,4.860215,4.75
7,2021.0,8,4.9125,5.0,4.875,4.875


> The year with more reviews is 2018.

> The overall score average is higher in 2016. The overall score does not increase over time.

> The overall average is higher in 2021. The overall average does not increase over time.

> The curriculum average is higher in 2016. The curriculum average does not increase over time.

> The jobSupport average is higher in 2017. The jobSupport average does not increase over time.

Let´s analyse APP-ACADEMY:

Which programms are reviewed in the comments:

In [49]:
app_prog = queries_execute("""SELECT DISTINCT program 
FROM competitive_landscape.comments 
WHERE school = 'app-academy' ORDER BY program ASC;""")
app_prog_df = pd.DataFrame(app_prog,columns=['App-academy programms reviewed in comments'])
app_prog_df = app_prog_df.drop(labels=[0,1], axis=0)
app_prog_df

Unnamed: 0,App-academy programms reviewed in comments
2,App Academy Open
3,Bootcamp Prep
4,Software Engineer Track: In-Person
5,Software Engineer Track: Online


Now lets evaluate the number of reviews by year associated with App Academy:

In [50]:
app_total_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport)
FROM competitive_landscape.comments 
WHERE school = 'app-academy' GROUP BY graduatingYear ORDER BY graduatingYear ASC;""")

pd.DataFrame(app_total_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,,3,3.8,3.666667,4.666667,3.0
1,2013.0,11,4.9,5.0,4.875,4.666667
2,2014.0,16,4.8375,4.875,4.923077,4.615385
3,2015.0,17,4.54375,4.625,4.8125,4.1875
4,2016.0,93,4.812903,4.870968,4.880435,4.681818
5,2017.0,232,4.651948,4.695652,4.705628,4.519048
6,2018.0,203,4.671921,4.704433,4.714286,4.566667
7,2019.0,223,4.534081,4.650224,4.509009,4.422535
8,2020.0,210,4.427619,4.614286,4.328571,4.330144
9,2021.0,38,4.689474,4.789474,4.526316,4.742857


> App academy only have courses associated with software engineer

> The year with more reviews is 2017 followed by 2019.

> The overall score average is higher in 2013. The overall score average does not increase over time.

> The overall average is higher in 2014. The overall average decrease over time.

> The curriculum average is higher in 2014. The curriculum average decrease over time.

> The jobSupport average is higher in 2016. The jobSupport average does not increase over time.

Let´s analyse SPRINGBOARD:

Which programms are reviewed in the comments:

In [51]:
spring_prog = queries_execute("""SELECT DISTINCT program 
FROM competitive_landscape.comments 
WHERE school = 'springboard' ORDER BY program ASC;""")
spring_prog_df = pd.DataFrame(spring_prog,columns=['Springboard programms reviewed in comments'])
spring_prog_df = spring_prog_df.drop(labels=[0,1], axis=0)
spring_prog_df

Unnamed: 0,Springboard programms reviewed in comments
2,Business Analytics
3,Cybersecurity Career Track
4,Data Analytics Career Track
5,Data Science Career Track
6,Data Science Career Track Prep
7,Digital Marketing Career Track
8,Digital Marketing for Professionals
9,Intermediate Data Science
10,Introduction to Cybersecurity
11,Introduction to Data Science


Now lets evaluate the number of reviews by year associated with Springboard:

In [52]:
spring_total_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport)
FROM competitive_landscape.comments 
WHERE school = 'springboard' GROUP BY graduatingYear ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_total_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,,1,5.0,5.0,5.0,5.0
1,2014.0,1,4.7,5.0,5.0,4.0
2,2015.0,4,4.925,5.0,4.75,5.0
3,2016.0,24,4.741667,4.916667,4.833333,4.35
4,2017.0,67,4.6,4.791045,4.731343,4.142857
5,2018.0,197,4.573604,4.680203,4.596939,4.354167
6,2019.0,169,4.478698,4.650888,4.461538,4.241135
7,2020.0,310,4.532903,4.622581,4.480645,4.475083
8,2021.0,141,4.648936,4.744681,4.51773,4.720588


> The year with more reviews is 2020 followed by 2018.

> The overall score average is higher in 2015. The overall score average does not increase over time.

> The overall average is higher in 2014 e 2015 (years with less reviews). The overall average does not increase over time.

> The curriculum average is higher in 2014 (1 review). The overall average decrease over time.

> The jobSupport average is higher in 2015. The jobSupport average does not increase over time.

Lets evaluate by course area:

DATA ANALYTICS:

In [53]:
spring_reviews_data = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Business Analytics')|(program = 'Data Analytics Career Track'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_data,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,1,4.7,5.0,4.0,5.0
1,2017.0,2,5.0,5.0,5.0,5.0
2,2018.0,5,4.1,4.0,4.4,3.333333
3,2019.0,6,3.666667,3.833333,4.0,3.166667
4,2020.0,49,4.516327,4.653061,4.428571,4.44898
5,2021.0,38,4.668421,4.815789,4.605263,4.583333


> The year with more reviews is 2020.

> The overall score average is higher in 2017 (only 2 reviews). The overall score average does not increase over time.

> The overall average is higher in 2015 and 2017 (year with less reviews). The overall average does not increase over time.

> The curriculum average is higher in 2017. The curriculum average does not increase over time.

> The jobSupport average is higher in 2017 (only 2 reviews). The jobSupport average increase over time.

CYBERSECURITY:

In [54]:
spring_reviews_cyber = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Cybersecurity Career Track')|(program = 'Introduction to Cybersecurity'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_cyber,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2018.0,2,4.85,5.0,4.5,5.0


> Regarding cybersecurity we only have data from 2018. No insights can be derived from this data.

DATA SCIENCE / MACHINE LEARNING:

In [55]:
spring_reviews_science = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Data Science Career Track')|(program = 'Data Science Career Track Prep')|(program = 'Intermediate Data Science')|(program = 'Introduction to Data Science')|(program = 'Machine Learning Engineering Career Track'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_science,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,1,5.0,5.0,5.0,5.0
1,2016.0,10,4.8,4.9,4.8,4.625
2,2017.0,28,4.571429,4.714286,4.642857,4.28
3,2018.0,79,4.55443,4.620253,4.518987,4.523077
4,2019.0,56,4.4625,4.642857,4.339286,4.326087
5,2020.0,92,4.478261,4.586957,4.434783,4.388889
6,2021.0,19,4.636842,4.842105,4.263158,4.777778


> The year with more reviews is 2020.

> The overall score average is higher in 2015 (only 1 review) followed by 2016 (2 reviews). The overall score average does not increase over time.

> The overall average is higher in 2015. The overall average does not increase over time.

> The curriculum average is higher in 2015. The curriculum average decrease over time.

> The jobSupport average is higher in 2015. The jobSupport average does not increase over time.

DIGITAL MARKETING:

In [56]:
spring_reviews_digital = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Digital Marketing Career Track')|(program = 'Digital Marketing for Professionals'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_digital,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2017.0,2,4.85,5.0,5.0,4.5
1,2018.0,6,4.65,4.833333,4.333333,5.0
2,2019.0,12,4.575,4.833333,4.333333,4.444444
3,2020.0,2,4.0,4.5,4.0,3.5


> The year with more reviews is 2019.

> The overall score average is higher in 2017 (only 2 review). The overall score average decrease over time.

> The overall average is higher in 2017. The overall average decrease over time.

> The curriculum average is higher in 2017. The curriculum average decrease over time.

> The jobSupport average is higher in 2017. The jobSupport average decrease over time.

SOFTWARE ENGINEERING:

In [57]:
spring_reviews_soft = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Software Engineering Career Track')|(program = 'Software Engineering Career Track Prep Course'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_soft,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2020.0,8,4.0875,4.125,4.0,4.125
1,2021.0,17,4.676471,4.647059,4.705882,4.875


> Regarding software engineering we only have data from 2020 and 2021. No insights can be derived from this data.

UI/UX DESIGN:

In [58]:
spring_reviews_ux = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'UI/UX Design Career Track')|(program = 'UX Career Track')|(program = 'UX Design'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_ux,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,2,5.0,5.0,5.0,5.0
1,2016.0,13,4.676923,4.923077,4.846154,4.166667
2,2017.0,30,4.526667,4.8,4.733333,3.88
3,2018.0,77,4.562338,4.701299,4.644737,4.134615
4,2019.0,54,4.627778,4.814815,4.666667,4.363636
5,2020.0,113,4.60885,4.663717,4.548673,4.605505
6,2021.0,53,4.667925,4.735849,4.566038,4.679245


> The year with more reviews is 2020.

> The overall score average is higher in 2015 (only 2 review). The overall score average does not increase over time.

> The overall average is higher in 2015. The overall average decrease over time.

> The curriculum average is higher in 2015. The curriculum average decrease over time.

> The jobSupport average is higher in 2015. The jobSupport average increase over time.

Let´s analyse LE-WAGON:

Which programms are reviewed in the comments:

In [59]:
wagon_prog = queries_execute("""SELECT DISTINCT program 
FROM competitive_landscape.comments 
WHERE school_id = 10868 ORDER BY program ASC;""")
wagon_prog_df = pd.DataFrame(wagon_prog,columns=['Springboard programms reviewed in comments'])
wagon_prog_df = wagon_prog_df.drop(labels=[0,1], axis=0)
wagon_prog_df

Unnamed: 0,Springboard programms reviewed in comments
2,Data Science - Full-Time
3,FullStack program
4,FullStack program - 35+ locations
5,Web Development Course - Full-Time
6,Web Development Course - Part-Time


Now lets evaluate the number of reviews by year associated with Springboard:

In [60]:
wagon_total_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport)
FROM competitive_landscape.comments 
WHERE school_id = 10868 GROUP BY graduatingYear ORDER BY graduatingYear ASC;""")

pd.DataFrame(wagon_total_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,,5,5.0,5.0,5.0,5.0
1,2011.0,1,5.0,5.0,5.0,5.0
2,2014.0,16,4.913333,5.0,4.923077,4.769231
3,2015.0,34,4.990909,5.0,5.0,4.9
4,2016.0,94,4.955914,5.0,5.0,4.793651
5,2017.0,323,4.949226,4.993808,4.981308,4.855738
6,2018.0,467,4.946467,4.982869,4.976293,4.863309
7,2019.0,552,4.936232,4.976449,4.967213,4.847195
8,2020.0,411,4.901217,4.978102,4.939173,4.772277
9,2021.0,63,4.934921,5.0,4.952381,4.84127


> The year with more reviews is 2019 followed by 2018.

> The overall score average is higher in 2011 (1 review) followed by 2015. The overall score average is stable over time.

> The overall average is higher in 2021,2011,2014,2015 and 2016. The overall average is stable over time.

> The curriculum average is higher in 2011,2015 and 2016. The overall average is stable over time.

> The jobSupport average is higher in 2011. The jobSupport average is stable over time.

Lets evaluate by course area:

DATA SCIENCE:

In [61]:
wagon_data = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & (program = 'Data Science - Full-Time')) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(wagon_data,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2020.0,25,4.924,5.0,4.92,4.84
1,2021.0,11,4.972727,5.0,5.0,4.909091


> The year with more reviews is 2020.

> The overall score, curriculum and jobsupport average is higher in 2021.

> We only have data regarding 2020 and 2021

FULLSTACK:

In [62]:
wagon_full = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & ((program = 'FullStack program - 35+ locations')|(program = 'FullStack program'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(wagon_full,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,,2,5.0,5.0,5.0,5.0
1,2011.0,1,5.0,5.0,5.0,5.0
2,2014.0,6,4.95,5.0,5.0,4.833333
3,2015.0,9,5.0,5.0,5.0,5.0
4,2016.0,69,4.94058,5.0,5.0,4.783333
5,2017.0,301,4.945515,4.993355,4.979933,4.84507
6,2018.0,457,4.945295,4.982495,4.975824,4.860294
7,2019.0,422,4.949052,4.990521,4.983373,4.857855
8,2020.0,77,4.849351,5.0,4.935065,4.597403


> The year with more reviews is 2018 followed by 2019.

> The overall score average is higher in 2015 (only 9 review) and 2011 (1 review). The overall score average is stable over time.

> The overall average, curriculum and jobSupport is stable over time.

WEB DEVELOPMENT:

In [63]:
wagon_web = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school_id = 10868) & ((program = 'Web Development Course - Full-Time')|(program = 'Web Development Course - Part-Time'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(wagon_web,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2017.0,1,5.0,5.0,5.0,5.0
1,2018.0,2,5.0,5.0,5.0,5.0
2,2019.0,26,4.942308,5.0,4.923077,4.884615
3,2020.0,269,4.91487,4.973978,4.944238,4.814394
4,2021.0,47,4.925532,5.0,4.93617,4.829787


> The year with more reviews is 2020.

> The overall score average is higher in 2018 (only 2 review).

> The overall average, curriculum and jobSupport is stable over time.

Both Ironhack and Springboard have courses in Data Analytics and UI/UX Design. Lets analyse the differences: 

DATA ANALYTICS:

In [64]:
data_reviews = queries_execute("""SELECT graduatingYear,school,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE (((school = 'springboard')|(school = 'ironhack')) & ((program = 'Data Analytics Bootcamp')|(program = 'Data Analytics Part-Time')|(program = 'Business Analytics')|(program = 'Data Analytics Career Track')))
GROUP BY graduatingYear,school
ORDER BY graduatingYear ASC;""")

pd.DataFrame(data_reviews,columns=['graduatingYear','school','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,school,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,springboard,1,4.7,5.0,4.0,5.0
1,2017.0,springboard,2,5.0,5.0,5.0,5.0
2,2018.0,springboard,5,4.1,4.0,4.4,3.333333
3,2019.0,ironhack,8,4.4125,4.5,4.125,4.625
4,2019.0,springboard,6,3.666667,3.833333,4.0,3.166667
5,2020.0,ironhack,26,4.711538,4.807692,4.730769,4.576923
6,2020.0,springboard,49,4.516327,4.653061,4.428571,4.44898
7,2021.0,springboard,38,4.668421,4.815789,4.605263,4.583333


> The year with more reviews is 2020 in both schools.

> In 2019 Ironhack had higher scores in the 4 parameters and a higher number of reviews (8 versus 6)

> In 2020 Ironhack had higher scores in the 4 parameters but with a lower number of reviews (26 versus 49)

UI/UX DESIGN:

In [65]:
reviews_ux = queries_execute("""SELECT graduatingYear,school,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments
WHERE (((school = 'springboard')|(school = 'ironhack')) & ((program = 'Full-time UX/UI Design Bootcamp')|(program = 'Part-time UX/UI Design')|(program = 'Part-time UX/UI Design ')|(program = 'UX/UI Design Bootcamp')|(program = 'UX/UI Design Part-Time')|(program = 'UI/UX Design Career Track')|(program = 'UX Career Track')|(program = 'UX Design')) )
GROUP BY graduatingYear,school 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(reviews_ux,columns=['graduatingYear','school','comments','overallScore','overall','curriculum','jobSupport'])


Unnamed: 0,graduatingYear,school,comments,overallScore,overall,curriculum,jobSupport
0,2011.0,ironhack,1,4.7,5.0,4.0,5.0
1,2015.0,springboard,2,5.0,5.0,5.0,5.0
2,2016.0,ironhack,7,4.957143,4.857143,5.0,5.0
3,2016.0,springboard,13,4.676923,4.923077,4.846154,4.166667
4,2017.0,ironhack,58,4.818966,4.896552,4.719298,4.839286
5,2017.0,springboard,30,4.526667,4.8,4.733333,3.88
6,2018.0,ironhack,85,4.778824,4.823529,4.72619,4.795181
7,2018.0,springboard,77,4.562338,4.701299,4.644737,4.134615
8,2019.0,ironhack,59,4.652542,4.745763,4.59322,4.610169
9,2019.0,springboard,54,4.627778,4.814815,4.666667,4.363636


> The year with more reviews is 2018 in Ironhack (85 reviews) versus 2020 in springboard (113 reviews).

> In 2016 Ironhack had higher scores in all parameters with exception of overall but with a lower number of reviews (7 versus 13)

> In 2017 Ironhack had higher scores in overall and overall score and a higher number of reviews  (58 versus 30)

> In 2018 Ironhack had higher scores in the 4 parameters and a higher number of reviews (85 versus 77)

> In 2019 Ironhack had higher scores in overall score and job support and higher number of reviews (59 versus 54)

> In 2020 Ironhack had lower scores in all parameter with exception of curriculum and a lower number of reviews (40 versus 113)

> In 2021 Ironhack had higher scores in overall and similar score in curriculum and a lower number of reviews (7 versus 52)

Both Ironhack and Le-Wagon have courses in Web development. Lets analyse the differences:

WEB DEVELOPMENT:

In [66]:
reviews_web = queries_execute("""SELECT graduatingYear,school_id,school,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments
WHERE (((school_id = 10868)|(school_id = 10828)) & ((program = 'Full-time Web Development Bootcamp') | (program = 'Full-time Web Development Bootcamp ') | 
  (program = 'Part-time Web Development') | (program = 'Web Design') | (program = 'Web Development Bootcamp') | 
  (program = 'Web Development Part-Time') | (program = 'Web Development Course - Full-Time')|
  (program = 'Web Development Course - Part-Time')))
GROUP BY graduatingYear,school 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(reviews_web,columns=['graduatingYear','school id','school','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,school id,school,comments,overallScore,overall,curriculum,jobSupport
0,2014.0,10828,ironhack,8,4.7125,4.75,4.75,4.625
1,2015.0,10828,ironhack,31,4.844444,4.925926,4.705882,4.882353
2,2016.0,10828,ironhack,67,4.925758,4.954545,4.882353,4.882353
3,2017.0,10828,ironhack,127,4.900787,4.92126,4.848,4.943089
4,2017.0,10868,le-wagon,1,5.0,5.0,5.0,5.0
5,2018.0,10828,ironhack,207,4.912077,4.937198,4.859223,4.936275
6,2018.0,10868,le-wagon,2,5.0,5.0,5.0,5.0
7,2019.0,10828,ironhack,120,4.849167,4.866667,4.783333,4.891667
8,2019.0,10868,le-wagon,26,4.942308,5.0,4.923077,4.884615
9,2020.0,10828,ironhack,93,4.852688,4.935484,4.860215,4.75


> The year with more reviews is 2018 in Ironhack (207 reviews) versus 2020 in le-wagon (269 reviews).

> We dont have data from reviews regarding le-wagon before 2017.

> In 2017 and 2018 Ironhack had lower scores in all parameters nevertheless, it should be taked in consideration that le-wagon only had 1 (2017) and 2 (2018) reviews versus 127 and 207 in Ironhack.

> In 2019 Ironhack had lower scores in all parameters with exception of jobSupport but with a higher number of reviews (120 versus 26)

> In 2020 Ironhack had lower scores in all parameters in all parameters and a lower number of reviews (93 versus 269)

> In 2021 Ironhack had higher scores in jobSupport and similar score in overall and a lower number of reviews (8 versus 47)

# Third Question:

There is any positive correlation between number of reviews in schools and some country development indicators: 
> Total Population | Population growth (annual %) | Primary completion rate, total (% of relevant age group) | School enrollment, primary (% gross) | School enrollment, secondary (% gross) | School enrollment, primary and secondary (gross), gender parity index (GPI) | Mobile cellular subscriptions (per 100 people) | Individuals using the Internet (% of population) | 
High-technology exports (% of manufactured exports)

First lets evaluate school locations:

In [74]:
school_locations_scores = queries_execute("""SELECT school,COUNT(country_name)
FROM competitive_landscape.locations
WHERE country_name <> 'None'
GROUP BY school
ORDER BY school;""")

pd.DataFrame(school_locations_scores,columns=['school','nr. locations'])

Unnamed: 0,school,nr. locations
0,app-academy,2
1,ironhack,9
2,le-wagon,45


> The school with more locations is le-wagon (45) followed by ironhack (9) and app-academy (2). Let's take a closer look: 

In [77]:
school_locations_scores = queries_execute("""SELECT DISTINCT country_name
FROM competitive_landscape.locations
WHERE ((country_name <> 'None') & (school = 'app-academy'))
ORDER BY country_name;""")

pd.DataFrame(school_locations_scores,columns=['country'])

Unnamed: 0,country
0,United States


In [78]:
school_locations_scores = queries_execute("""SELECT DISTINCT country_name
FROM competitive_landscape.locations
WHERE ((country_name <> 'None') & (school = 'ironhack'))
ORDER BY country_name;""")

pd.DataFrame(school_locations_scores,columns=['country'])

Unnamed: 0,country
0,Brazil
1,France
2,Germany
3,Mexico
4,Netherlands
5,Portugal
6,Spain
7,United States


In [76]:
school_locations_scores = queries_execute("""SELECT DISTINCT country_name
FROM competitive_landscape.locations
WHERE ((country_name <> 'None') & (school = 'le-wagon'))
ORDER BY country_name;""")

pd.DataFrame(school_locations_scores,columns=['country'])

Unnamed: 0,country
0,Argentina
1,Australia
2,Belgium
3,Brazil
4,Canada
5,Chile
6,China
7,Denmark
8,England
9,France


> The school with more locations is le-wagon (45) followed by ironhack (9) and app-academy (2). Some of this locations are in the same country

> App-academy is only located in United States.

> Ironhack is located in 8 countries (Brazil, France, Germany, Mexico, Netherlands, Portugal, Spain and United States).

> Le-wagon is located in 28 countries (Argentina, Australia, Belgium, Brazil, Canada, Chile, China, Denmark, England, France, Germany, Indonesia, Israel, Italy, Japan, Mauritius, Mexico, Morocco, Netherlands, Norway, Portugal, Singapore, South Korea, Spain, Sweden, Switzerland, Turkey, United Arab Emirates).

> Springboard only offers online courses.

In [84]:
school_locations_scores = queries_execute("""SELECT comments.school,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport),country_info.tot_country
FROM competitive_landscape.comments
LEFT JOIN (SELECT school, COUNT(country_name) AS tot_country
FROM competitive_landscape.locations
WHERE country_name <> 'None'
GROUP BY school) AS country_info
ON comments.school = country_info.school
GROUP BY school;""")

pd.DataFrame(school_locations_scores,columns=['school','reviews','avr overall score','avr overall','avg curriculum','avg job support','dif. country locations'])

Unnamed: 0,school,reviews,avr overall score,avr overall,avg curriculum,avg job support,dif. country locations
0,app-academy,1046,4.602301,4.691643,4.603282,4.478261,2.0
1,ironhack,1045,4.825,4.874038,4.783944,4.808809,9.0
2,le-wagon,1966,4.935303,4.983698,4.966422,4.833519,45.0
3,springboard,914,4.562363,4.681619,4.537788,4.430348,


> More locations doesnt mean more reviews
> All the parameters increase with the increase of different country locations.
> Schools with more locations have higher scores.

Now lets compare the average of some indicators in the countries were the schools are located:

In [95]:
iron_indicatores = queries_execute("""SELECT locations.school, indicator, unit, AVG(year_2018)
FROM competitive_landscape.locations
LEFT JOIN (SELECT country_name, indicator, unit, year_2018
FROM competitive_landscape.country_data) AS indic_info
ON locations.country_name = indic_info.country_name
WHERE (locations.school = 'ironhack')
GROUP BY indicator
ORDER BY locations.country_name;""")

iron_ind_df = pd.DataFrame(iron_indicatores,columns=['school','indicator','unit','avr in 2018'])
iron_ind_df

Unnamed: 0,school,indicator,unit,avr in 2018
0,ironhack,,,
1,ironhack,High-technology exports (% of manufactured exp...,Unit (0),15.222222
2,ironhack,Individuals using the Internet (% of population),Unit (0.0),81.466667
3,ironhack,Mobile cellular subscriptions (per 100 people),Unit (0.0),114.8
4,ironhack,"School enrollment, primary and secondary (gros...",Unit (0),1.0
5,ironhack,"School enrollment, secondary (% gross)",Unit (0),114.125
6,ironhack,"School enrollment, primary (% gross)",Unit (0.0),103.6375
7,ironhack,"Primary completion rate, total (% of relevant ...",Unit (0),100.166667
8,ironhack,Population growth (annual %),Unit (0.0),0.455556
9,ironhack,"Population, total",Millions (0.00),103.704444


In [96]:
wagon_indicatores = queries_execute("""SELECT locations.school, indicator, unit, AVG(year_2018)
FROM competitive_landscape.locations
LEFT JOIN (SELECT country_name, indicator, unit, year_2018
FROM competitive_landscape.country_data) AS indic_info
ON locations.country_name = indic_info.country_name
WHERE (locations.school = 'le-wagon')
GROUP BY indicator
ORDER BY locations.country_name;""")

wagon_ind_df = pd.DataFrame(wagon_indicatores,columns=['school','indicator','unit','avr in 2018'])
wagon_ind_df

Unnamed: 0,school,indicator,unit,avr in 2018
0,le-wagon,High-technology exports (% of manufactured exp...,Unit (0),16.883721
1,le-wagon,Individuals using the Internet (% of population),Unit (0.0),79.793023
2,le-wagon,Mobile cellular subscriptions (per 100 people),Unit (0.0),119.65814
3,le-wagon,"School enrollment, primary and secondary (gros...",Unit (0),1.0
4,le-wagon,"School enrollment, secondary (% gross)",Unit (0),110.333333
5,le-wagon,"School enrollment, primary (% gross)",Unit (0.0),103.731579
6,le-wagon,"Primary completion rate, total (% of relevant ...",Unit (0),99.15
7,le-wagon,Population growth (annual %),Unit (0.0),0.606977
8,le-wagon,"Population, total",Millions (0.00),158.773953
9,le-wagon,,,


In [97]:
app_indicatores = queries_execute("""SELECT locations.school, indicator, unit, AVG(year_2018)
FROM competitive_landscape.locations
LEFT JOIN (SELECT country_name, indicator, unit, year_2018
FROM competitive_landscape.country_data) AS indic_info
ON locations.country_name = indic_info.country_name
WHERE (locations.school = 'app-academy')
GROUP BY indicator
ORDER BY locations.country_name;""")

app_ind_df = pd.DataFrame(app_indicatores,columns=['school','indicator','unit','avr in 2018'])
app_ind_df

Unnamed: 0,school,indicator,unit,avr in 2018
0,app-academy,,,
1,app-academy,High-technology exports (% of manufactured exp...,Unit (0),19.0
2,app-academy,Individuals using the Internet (% of population),Unit (0.0),88.5
3,app-academy,Mobile cellular subscriptions (per 100 people),Unit (0.0),129.0
4,app-academy,"School enrollment, primary and secondary (gros...",Unit (0),1.0
5,app-academy,"School enrollment, secondary (% gross)",Unit (0),99.0
6,app-academy,"School enrollment, primary (% gross)",Unit (0.0),101.3
7,app-academy,"Primary completion rate, total (% of relevant ...",Unit (0),100.0
8,app-academy,Population growth (annual %),Unit (0.0),0.5
9,app-academy,"Population, total",Millions (0.00),326.69


In [98]:
data_concat = pd.concat([iron_ind_df,wagon_ind_df,app_ind_df])
data_concat.sort_values(by=['indicator'])

Unnamed: 0,school,indicator,unit,avr in 2018
1,ironhack,High-technology exports (% of manufactured exp...,Unit (0),15.222222
1,app-academy,High-technology exports (% of manufactured exp...,Unit (0),19.0
0,le-wagon,High-technology exports (% of manufactured exp...,Unit (0),16.883721
2,ironhack,Individuals using the Internet (% of population),Unit (0.0),81.466667
2,app-academy,Individuals using the Internet (% of population),Unit (0.0),88.5
1,le-wagon,Individuals using the Internet (% of population),Unit (0.0),79.793023
3,ironhack,Mobile cellular subscriptions (per 100 people),Unit (0.0),114.8
3,app-academy,Mobile cellular subscriptions (per 100 people),Unit (0.0),129.0
2,le-wagon,Mobile cellular subscriptions (per 100 people),Unit (0.0),119.65814
8,ironhack,Population growth (annual %),Unit (0.0),0.455556


> The school with more locations is le-wagon (45) followed by ironhack (9) and app-academy (2). Some of this locations are in the same country. Springboard only offers online courses.

> The school located in more countries is le-wagon (28), followed by ironhack (8) and app-academy (1). More locations doesnt mean more reviews All the parameters review increase with the increase of different country locations. Schools with more locations have higher scores.

There is any positive correlation between number of reviews in schools and some country development indicators: 

Let's analyse first indicatores related to country population indicators

> Total Population (Millions): App-academy is only located in US, where the total population is higher when comparing with smaller countries. Nevertheless app-academy (326.690000) have a total population average in their locations higher when comparing with le-wagon (158.773953) and ironhack (103.704444).

> Population growth (annual %): Le-wagon (0.606977) have a annual population growth average in their locations higher when comparing with app-academy (0.500000) and ironhack (0.455556).

> When analysing the data, Le-wagon and app-academy have a higher possibility of reach new possible students.


And regarding education indicators:

> Primary completion rate, total (%): ironhack (100.166667) have a primary completion rate average in their locations higher when comparing with app-academy (100.000000) and le-wagon (99.150000).

> School enrollment, primary (% gross): le-wagon (103.731579) have a primary school enrollment average in their locations higher when comparing with ironhack (103.637500) and app-academy (101.300000).

> School enrollment, secondary (% gross): ironhack (114.125000) have a secondary school enrollment rate average in their locations higher when comparing with le-wagon (110.333333) and app-academy (99.000000).

> School enrollment, primary and secondary (gross), gender parity index (GPI): all the three schools have a average gender parity index associated with school enrollment of 1.0. 

> When analysing the data, even if Le-wagon and app-academy have a higher possibility of reach new possible students, associated with population indicators, ironhack have higher possibility of reach students with a complete secondary school enrollment and with higher education completion.

Last but not least, tecnology indicators:

> Mobile cellular subscriptions (per 100 people): app-academy (129.000000) have a mobile cellular subscriptions average in their locations higher when comparing with le-wagon (119.658140) and ironhack (114.800000).

> Individuals using the Internet (% of population): app-academy (88.500000) have a individuals using internet average in their locations higher when comparing with ironhack (81.466667) and le-wagon (79.793023).

> High-technology exports (% of manufactured exports): app-academy (19.000000) have a high-technology exports average in their locations higher when comparing with le-wagon (16.883721) and ironhack (15.222222).

> When analysing the data, app-academy have a higher possiblity of reach new students with acess to technology.  

How this information can be related with number of reviews? 

> Le-wagon have a higher number of reviews and higher number of location. This school have also higher annual population growth and total Population average in their locations when comparing with ironhack. 

> Education indicators or tecnology indicators seem not so associated with reviews number