# Competitive Landscape SQL Project

You were hired by Ironhack to perform an Analytics Consulting Project entitled: competitive landscape.

Your mission is to create and populate an appropriate database with many coding schools that are our competition, as well as design an suitable queries that answer business questions of interest (to be defined by you)

Bonus: How will this datamodel be updated in the future? Please write auxiliary functions that test the database for data quality issues. For example: how could you make sure you only include the most recent comments when you re-run the script?

Crucial hint: check out the following tutorial: https://www.dataquest.io/blog/sql-insert-tutorial/

# Our Queries

In [1]:
import mysql.connector

In [None]:
cnx = mysql.connector.connect(user = "root", password = input('password:'),host="localhost",database="competitive_landscape")

In [3]:
cnx.is_connected()

True

In [4]:
cursor = cnx.cursor()
cursor

<mysql.connector.cursor.MySQLCursor at 0x26882a90520>

In [5]:
def queries_execute(x):
    cursor.execute(x)
    queries_result = cursor.fetchall()
    return queries_result

In [6]:
import pandas as pd

# First Question: Correlation between graduating year and review scores

Let´s analyse IRONHACK first

Which programms are reviewed in the comments:

In [7]:
iron_prog = queries_execute("""SELECT DISTINCT program 
FROM competitive_landscape.comments 
WHERE school = 'ironhack' ORDER BY program ASC;""")
iron_prog_df = pd.DataFrame(iron_prog,columns=['Ironhack programms reviewed in comments'])
iron_prog_df = iron_prog_df.drop(labels=[0,1], axis=0)
iron_prog_df

Unnamed: 0,Ironhack programms reviewed in comments
2,Data Analytics Bootcamp
3,Data Analytics Part-Time
4,Full-time UX/UI Design Bootcamp
5,Full-time Web Development Bootcamp
6,Full-time Web Development Bootcamp
7,Part-time UX/UI Design
8,Part-time UX/UI Design
9,Part-time Web Development
10,UX/UI Design Bootcamp
11,UX/UI Design Part-Time


Now lets evaluate the number of reviews by year associated with Ironhack:

In [8]:
iron_total_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),
AVG(overall),AVG(curriculum),AVG(jobSupport)
FROM competitive_landscape.comments 
WHERE school = 'ironhack' GROUP BY graduatingYear ORDER BY graduatingYear ASC;""")

pd.DataFrame(iron_total_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2011.0,1,4.7,5.0,4.0,5.0
1,2014.0,9,4.711111,4.777778,4.666667,4.666667
2,2015.0,32,4.85,4.928571,4.722222,4.888889
3,2016.0,75,4.92973,4.945946,4.896552,4.896552
4,2017.0,198,4.871717,4.914141,4.805128,4.901042
5,2018.0,309,4.871197,4.906149,4.820847,4.887789
6,2019.0,217,4.765438,4.801843,4.700461,4.796296
7,2020.0,184,4.728804,4.820652,4.777174,4.576923
8,2021.0,20,4.815,4.95,4.8,4.7


> The year with more reviews is 2018 followed by 2019.

> The overall score average is higher in 2016. The overall score average does not increase over time.

> The overall average is higher in 2011. The overall average does not increase over time.

> The curriculum average is higher in 2016. The curriculum average does not increase over time.

> The jobSupport average is higher in 2017. The jobSupport average does not increase over time.


Lets evaluate by course area: 

DATA ANALYTICS:

In [9]:
iron_reviews_data = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'ironhack') & ((program = 'Data Analytics Bootcamp') | (program = 'Data Analytics Part-Time'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(iron_reviews_data,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2019.0,8,4.4125,4.5,4.125,4.625
1,2020.0,26,4.711538,4.807692,4.730769,4.576923


> The year with more reviews is 2020. We only have comments associated with 2 different graduating years. 

> The overall score average is higher in 2020. The overall score average increase over time.

> The overall average is higher in 2020. The overall average increase over time.

> The curriculum average is higher in 2020. The curriculum average increase over time.

> The jobSupport average is higher in 2019. The jobSupport average decrease over time.

UX/UI DESIGN:

In [10]:
iron_reviews_ux = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments
WHERE (school = 'ironhack') & ((program = 'Full-time UX/UI Design Bootcamp') | (program = 'Part-time UX/UI Design') | (program = 'Part-time UX/UI Design ') | (program = 'UX/UI Design Bootcamp') | (program = 'UX/UI Design Part-Time')) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")


pd.DataFrame(iron_reviews_ux,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])


Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2011.0,1,4.7,5.0,4.0,5.0
1,2016.0,7,4.957143,4.857143,5.0,5.0
2,2017.0,58,4.818966,4.896552,4.719298,4.839286
3,2018.0,85,4.778824,4.823529,4.72619,4.795181
4,2019.0,59,4.652542,4.745763,4.59322,4.610169
5,2020.0,40,4.5625,4.575,4.675,4.435897
6,2021.0,7,4.614286,4.857143,4.571429,4.428571


> The year with more reviews is 2018.

> The overall score average is higher in 2016. The overall score average decrease over time.

> The overall average is higher in 2011 (only 1 review). The overall average does not increase over time.

> The curriculum average is higher in 2016. The curriculum average decrease over time.

> The jobSupport average is higher in 2011 and 2016. The jobSupport average decrease over time.

WEB DEVELOPMENT:

In [11]:
iron_web_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE (school = 'ironhack') & ((program = 'Full-time Web Development Bootcamp') | (program = 'Full-time Web Development Bootcamp ') |  (program = 'Part-time Web Development') | (program = 'Web Design') |  (program = 'Web Development Bootcamp') | (program = 'Web Development Part-Time'))
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(iron_web_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2014.0,8,4.7125,4.75,4.75,4.625
1,2015.0,31,4.844444,4.925926,4.705882,4.882353
2,2016.0,67,4.925758,4.954545,4.882353,4.882353
3,2017.0,127,4.900787,4.92126,4.848,4.943089
4,2018.0,207,4.912077,4.937198,4.859223,4.936275
5,2019.0,120,4.849167,4.866667,4.783333,4.891667
6,2020.0,93,4.852688,4.935484,4.860215,4.75
7,2021.0,8,4.9125,5.0,4.875,4.875


> The year with more reviews is 2018.

> The overall score average is higher in 2016. The overall score does not increase over time.

> The overall average is higher in 2021. The overall average does not increase over time.

> The curriculum average is higher in 2016. The curriculum average does not increase over time.

> The jobSupport average is higher in 2017. The jobSupport average does not increase over time.

Let´s analyse APP-ACADEMY:

Which programms are reviewed in the comments:

In [12]:
app_prog = queries_execute("""SELECT DISTINCT program 
FROM competitive_landscape.comments 
WHERE school = 'app-academy' ORDER BY program ASC;""")
app_prog_df = pd.DataFrame(app_prog,columns=['App-academy programms reviewed in comments'])
app_prog_df = app_prog_df.drop(labels=[0,1], axis=0)
app_prog_df

Unnamed: 0,App-academy programms reviewed in comments
2,App Academy Open
3,Bootcamp Prep
4,Software Engineer Track: In-Person
5,Software Engineer Track: Online


Now lets evaluate the number of reviews by year associated with App Academy:

In [13]:
app_total_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport)
FROM competitive_landscape.comments 
WHERE school = 'app-academy' GROUP BY graduatingYear ORDER BY graduatingYear ASC;""")

pd.DataFrame(app_total_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,,3,3.8,3.666667,4.666667,3.0
1,2013.0,11,4.9,5.0,4.875,4.666667
2,2014.0,16,4.8375,4.875,4.923077,4.615385
3,2015.0,17,4.54375,4.625,4.8125,4.1875
4,2016.0,93,4.812903,4.870968,4.880435,4.681818
5,2017.0,232,4.651948,4.695652,4.705628,4.519048
6,2018.0,203,4.671921,4.704433,4.714286,4.566667
7,2019.0,223,4.534081,4.650224,4.509009,4.422535
8,2020.0,210,4.427619,4.614286,4.328571,4.330144
9,2021.0,37,4.694595,4.783784,4.540541,4.742857


> App academy only have courses associated with software engineer

> The year with more reviews is 2017 followed by 2019.

> The overall score average is higher in 2013. The overall score average does not increase over time.

> The overall average is higher in 2014. The overall average decrease over time.

> The curriculum average is higher in 2014. The curriculum average decrease over time.

> The jobSupport average is higher in 2016. The jobSupport average does not increase over time.

Let´s analyse SPRINGBOARD:

Which programms are reviewed in the comments:

In [14]:
spring_prog = queries_execute("""SELECT DISTINCT program 
FROM competitive_landscape.comments 
WHERE school = 'springboard' ORDER BY program ASC;""")
spring_prog_df = pd.DataFrame(spring_prog,columns=['Springboard programms reviewed in comments'])
spring_prog_df = spring_prog_df.drop(labels=[0,1], axis=0)
spring_prog_df

Unnamed: 0,Springboard programms reviewed in comments
2,Business Analytics
3,Cybersecurity Career Track
4,Data Analytics Career Track
5,Data Science Career Track
6,Data Science Career Track Prep
7,Digital Marketing Career Track
8,Digital Marketing for Professionals
9,Intermediate Data Science
10,Introduction to Cybersecurity
11,Introduction to Data Science


Now lets evaluate the number of reviews by year associated with Springboard:

In [15]:
spring_total_reviews = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport)
FROM competitive_landscape.comments 
WHERE school = 'springboard' GROUP BY graduatingYear ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_total_reviews,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,,1,5.0,5.0,5.0,5.0
1,2014.0,1,4.7,5.0,5.0,4.0
2,2015.0,4,4.925,5.0,4.75,5.0
3,2016.0,24,4.741667,4.916667,4.833333,4.35
4,2017.0,67,4.6,4.791045,4.731343,4.142857
5,2018.0,197,4.573604,4.680203,4.596939,4.354167
6,2019.0,169,4.478698,4.650888,4.461538,4.241135
7,2020.0,310,4.532903,4.622581,4.480645,4.475083
8,2021.0,138,4.648551,4.746377,4.521739,4.714286


> The year with more reviews is 2020 followed by 2018.

> The overall score average is higher in 2015. The overall score average does not increase over time.

> The overall average is higher in 2014 e 2015 (years with less reviews). The overall average does not increase over time.

> The curriculum average is higher in 2014 (1 review). The overall average decrease over time.

> The jobSupport average is higher in 2015. The jobSupport average does not increase over time.

Lets evaluate by course area:

DATA ANALYTICS:

In [16]:
spring_reviews_data = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Business Analytics')|(program = 'Data Analytics Career Track'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_data,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,1,4.7,5.0,4.0,5.0
1,2017.0,2,5.0,5.0,5.0,5.0
2,2018.0,5,4.1,4.0,4.4,3.333333
3,2019.0,6,3.666667,3.833333,4.0,3.166667
4,2020.0,49,4.516327,4.653061,4.428571,4.44898
5,2021.0,37,4.678378,4.837838,4.621622,4.571429


> The year with more reviews is 2020.

> The overall score average is higher in 2017 (only 2 reviews). The overall score average does not increase over time.

> The overall average is higher in 2015 and 2017 (year with less reviews). The overall average does not increase over time.

> The curriculum average is higher in 2017. The curriculum average does not increase over time.

> The jobSupport average is higher in 2017 (only 2 reviews). The jobSupport average increase over time.

CYBERSECURITY:

In [17]:
spring_reviews_cyber = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Cybersecurity Career Track')|(program = 'Introduction to Cybersecurity'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_cyber,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2018.0,2,4.85,5.0,4.5,5.0


> Regarding cybersecurity we only have data from 2018. No insights can be derived from this data.

DATA SCIENCE / MACHINE LEARNING:

In [18]:
spring_reviews_science = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Data Science Career Track')|(program = 'Data Science Career Track Prep')|(program = 'Intermediate Data Science')|(program = 'Introduction to Data Science')|(program = 'Machine Learning Engineering Career Track'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_science,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,1,5.0,5.0,5.0,5.0
1,2016.0,10,4.8,4.9,4.8,4.625
2,2017.0,28,4.571429,4.714286,4.642857,4.28
3,2018.0,79,4.55443,4.620253,4.518987,4.523077
4,2019.0,56,4.4625,4.642857,4.339286,4.326087
5,2020.0,92,4.478261,4.586957,4.434783,4.388889
6,2021.0,19,4.636842,4.842105,4.263158,4.777778


> The year with more reviews is 2020.

> The overall score average is higher in 2015 (only 1 review) followed by 2016 (2 reviews). The overall score average does not increase over time.

> The overall average is higher in 2015. The overall average does not increase over time.

> The curriculum average is higher in 2015. The curriculum average decrease over time.

> The jobSupport average is higher in 2015. The jobSupport average does not increase over time.

DIGITAL MARKETING:

In [19]:
spring_reviews_digital = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Digital Marketing Career Track')|(program = 'Digital Marketing for Professionals'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_digital,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2017.0,2,4.85,5.0,5.0,4.5
1,2018.0,6,4.65,4.833333,4.333333,5.0
2,2019.0,12,4.575,4.833333,4.333333,4.444444
3,2020.0,2,4.0,4.5,4.0,3.5


> The year with more reviews is 2019.

> The overall score average is higher in 2017 (only 2 review). The overall score average decrease over time.

> The overall average is higher in 2017. The overall average decrease over time.

> The curriculum average is higher in 2017. The curriculum average decrease over time.

> The jobSupport average is higher in 2017. The jobSupport average decrease over time.

SOFTWARE ENGINEERING:

In [20]:
spring_reviews_soft = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'Software Engineering Career Track')|(program = 'Software Engineering Career Track Prep Course'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_soft,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2020.0,8,4.0875,4.125,4.0,4.125
1,2021.0,17,4.676471,4.647059,4.705882,4.875


> Regarding software engineering we only have data from 2020 and 2021. No insights can be derived from this data.

UI/UX DESIGN:

In [21]:
spring_reviews_ux = queries_execute("""SELECT graduatingYear,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE ((school = 'springboard') & ((program = 'UI/UX Design Career Track')|(program = 'UX Career Track')|(program = 'UX Design'))) 
GROUP BY graduatingYear 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(spring_reviews_ux,columns=['graduatingYear','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,2,5.0,5.0,5.0,5.0
1,2016.0,13,4.676923,4.923077,4.846154,4.166667
2,2017.0,30,4.526667,4.8,4.733333,3.88
3,2018.0,77,4.562338,4.701299,4.644737,4.134615
4,2019.0,54,4.627778,4.814815,4.666667,4.363636
5,2020.0,113,4.60885,4.663717,4.548673,4.605505
6,2021.0,52,4.667308,4.730769,4.576923,4.673077


> The year with more reviews is 2020.

> The overall score average is higher in 2015 (only 2 review). The overall score average does not increase over time.

> The overall average is higher in 2015. The overall average decrease over time.

> The curriculum average is higher in 2015. The curriculum average decrease over time.

> The jobSupport average is higher in 2015. The jobSupport average increase over time.

Both Ironhack and Springboard have courses in Data Analytics and UI/UX Design. Lets analyse the differences: 

DATA ANALYTICS:

In [22]:
data_reviews = queries_execute("""SELECT graduatingYear,school,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments 
WHERE (((school = 'springboard')|(school = 'ironhack')) & ((program = 'Data Analytics Bootcamp')|(program = 'Data Analytics Part-Time')|(program = 'Business Analytics')|(program = 'Data Analytics Career Track')))
GROUP BY graduatingYear,school
ORDER BY graduatingYear ASC;""")

pd.DataFrame(data_reviews,columns=['graduatingYear','school','comments','overallScore','overall','curriculum','jobSupport'])

Unnamed: 0,graduatingYear,school,comments,overallScore,overall,curriculum,jobSupport
0,2015.0,springboard,1,4.7,5.0,4.0,5.0
1,2017.0,springboard,2,5.0,5.0,5.0,5.0
2,2018.0,springboard,5,4.1,4.0,4.4,3.333333
3,2019.0,ironhack,8,4.4125,4.5,4.125,4.625
4,2019.0,springboard,6,3.666667,3.833333,4.0,3.166667
5,2020.0,ironhack,26,4.711538,4.807692,4.730769,4.576923
6,2020.0,springboard,49,4.516327,4.653061,4.428571,4.44898
7,2021.0,springboard,37,4.678378,4.837838,4.621622,4.571429


> The year with more reviews is 2020 in both schools.

> In 2019 Ironhack had higher scores in the 4 parameters and a higher number of reviews (8 versus 6)

> In 2020 Ironhack had higher scores in the 4 parameters but with a lower number of reviews (26 versus 49)

UI/UX DESIGN:

In [23]:
reviews_ux = queries_execute("""SELECT graduatingYear,school,COUNT(comment_id),AVG(overallScore),AVG(overall),AVG(curriculum),AVG(jobSupport) 
FROM competitive_landscape.comments
WHERE (((school = 'springboard')|(school = 'ironhack')) & ((program = 'Full-time UX/UI Design Bootcamp')|(program = 'Part-time UX/UI Design')|(program = 'Part-time UX/UI Design ')|(program = 'UX/UI Design Bootcamp')|(program = 'UX/UI Design Part-Time')|(program = 'UI/UX Design Career Track')|(program = 'UX Career Track')|(program = 'UX Design')) )
GROUP BY graduatingYear,school 
ORDER BY graduatingYear ASC;""")

pd.DataFrame(reviews_ux,columns=['graduatingYear','school','comments','overallScore','overall','curriculum','jobSupport'])


Unnamed: 0,graduatingYear,school,comments,overallScore,overall,curriculum,jobSupport
0,2011.0,ironhack,1,4.7,5.0,4.0,5.0
1,2015.0,springboard,2,5.0,5.0,5.0,5.0
2,2016.0,ironhack,7,4.957143,4.857143,5.0,5.0
3,2016.0,springboard,13,4.676923,4.923077,4.846154,4.166667
4,2017.0,ironhack,58,4.818966,4.896552,4.719298,4.839286
5,2017.0,springboard,30,4.526667,4.8,4.733333,3.88
6,2018.0,ironhack,85,4.778824,4.823529,4.72619,4.795181
7,2018.0,springboard,77,4.562338,4.701299,4.644737,4.134615
8,2019.0,ironhack,59,4.652542,4.745763,4.59322,4.610169
9,2019.0,springboard,54,4.627778,4.814815,4.666667,4.363636


> The year with more reviews is 2018 in Ironhack (85 reviews) versus 2020 in springboard (113 reviews).

> In 2016 Ironhack had higher scores in all parameters with exception of overall but with a lower number of reviews (7 versus 13)

> In 2017 Ironhack had higher scores in overall and overall score and a higher number of reviews  (58 versus 30)

> In 2018 Ironhack had higher scores in the 4 parameters and a higher number of reviews (85 versus 77)

> In 2019 Ironhack had higher scores in overall score and job support and higher number of reviews (59 versus 54)

> In 2020 Ironhack had lower scores in all parameter with exception of curriculum and a lower number of reviews (40 versus 113)

> In 2021 Ironhack had higher scores in overall and similar score in curriculum and a lower number of reviews (7 versus 52)

# Third Question: Correlation between review scores and school countries¶

First lets evaluate school locations:

In [39]:
school_locations_scores = queries_execute("""SELECT locations.location_id,locations.country_name,locations.school,school_score.reviews,school_score.avg_overall_score,school_score.avg_overall,school_score.avg_curriculum,school_score.avg_job_support
FROM competitive_landscape.locations
INNER JOIN (SELECT school,COUNT(comment_id) AS reviews, AVG(overallScore) AS avg_overall_score,AVG(overall) AS avg_overall, AVG(curriculum) AS avg_curriculum, AVG(jobSupport) AS avg_job_support 
FROM competitive_landscape.comments 
GROUP BY school) AS school_score
ON competitive_landscape.locations.school = competitive_landscape.school_score.school
WHERE country_name <> 'None'
GROUP BY country_name, school
ORDER BY school;""")

pd.DataFrame(school_locations_scores,columns=['location id','country','school','num_reviews','avg_overall_score','avg_overall','avg_curriculum','avg_job_support'])

Unnamed: 0,location id,country,school,num_reviews,avg_overall_score,avg_overall,avg_curriculum,avg_job_support
0,15705,United States,app-academy,1045,4.602399,4.691346,4.603865,4.478261
1,17731,United Kingdom,brainstation,301,4.572757,4.740864,4.673759,4.179825
2,17706,United States,brainstation,301,4.572757,4.740864,4.673759,4.179825
3,16010,Canada,brainstation,301,4.572757,4.740864,4.673759,4.179825
4,16709,Portugal,ironhack,1045,4.825,4.874038,4.783944,4.808809
5,16377,Spain,ironhack,1045,4.825,4.874038,4.783944,4.808809
6,16375,United States,ironhack,1045,4.825,4.874038,4.783944,4.808809
7,16109,France,ironhack,1045,4.825,4.874038,4.783944,4.808809
8,16088,Brazil,ironhack,1045,4.825,4.874038,4.783944,4.808809
9,16086,Netherlands,ironhack,1045,4.825,4.874038,4.783944,4.808809


> App-academy is only located in United States, with a total number of reviews of 1045.

> Brainstation is located in 3 countries (UK, US and Canda), with a total number of reviews of 301.

> Ironhack is located in 8 countries (PT, Sp, US, Fr, Br, Ne, Me and De), with a total number of reviews of 1045. 

> More locations doesnt mean more comments.

> Ironhack have in average higher scores in all 4 parameters.