# **Import Libraries**

In [53]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [54]:
# install pyreadstat
!pip install pyreadstat

# **Data Wrangling**

## load data

In [55]:
# read sav file
df = pd.read_spss("../input/students-math-score-for-different-teaching-style/1ResearchProjectData.sav")

In [56]:
# view random sample from data
df.sample(5)

In [57]:
# some information about the data
# total 7 columns
# total 217 rows
# null value exist
# Inappropriate data types 
df.info()

## data cleaning

In [58]:
# clean missing values

# number of missing values
df.isnull().sum().sum()

In [59]:
# The number of missing data is very small so it is appropriate to delete it
# check missing again
df.dropna(inplace=True)
df.isnull().sum().sum()

In [60]:
# clean duplicates values

# there is no duplicates
df.duplicated().sum()

In [61]:
# convert Student column type to integers
df = df.astype({"Student": int}, errors='raise') 
# check type
df.Student.dtype

In [62]:
# confirm changes
df.head()

# **Exploratory Data Analysis**

### Q1: What is the number of male and female students? 

In [63]:
male_female_number = df.Gender.value_counts()
male_female_number

In [64]:
male_female_number.plot(kind='bar',title = "Number of male and female")

### Q2: How many students of each ethnic are in the dataset

In [65]:
# unique ethnic
df.Ethnic.unique()

In [66]:
df.Ethnic.value_counts().plot(kind='bar', title = 'Number of students of each ethnic')

### Q3: What is the average score for students in general?

In [67]:
df.Score.mean()

### Q4: Does any gender have a higher score than the other?

In [68]:
df.groupby('Gender')['Student'].mean().plot(kind='bar',title = 'Average score by gender')

### Q5: Does one ethnic have more score than the other?

In [69]:
df.groupby('Ethnic')['Score'].mean().plot(kind = 'bar', title = 'Average score by ethnic')

In [70]:
sns.barplot(data = df, x = 'Ethnic', y = 'Score', hue = 'Gender', ci = None)
plt.title('Average score for each ethnic by gender')

### Q6: In order to answer the questions that we will ask in the next lines, it will be useful to add a new column with the method of teaching based on the description provided about the data.

In [71]:
df.wesson.unique()

In [72]:
# Ms.Ruger and Ms.Smith >>  standards-based
#  Ms.Wesson >> traditional
df['Teaching Method'] = np.where(df['wesson'] == 'Ruger_Smith', 'standards_based', 'traditional')
df.head()

### Q6: What is the best way to teach students?

In [73]:
teaching_method_group = df.groupby('Teaching Method')['Score'].mean()
teaching_method_group

In [74]:
teaching_method_group.plot(kind = 'bar', title = 'Average score according to the method of teaching')

### Q7: Is there a method of teaching that suits a specific ethnic or gender for students? 

In [75]:
sns.barplot(data = df, x = 'Ethnic', y = 'Score', hue = 'Teaching Method', ci = None)
plt.title('Average student scores for each ethnic by method of teaching')

In [76]:
sns.barplot(data = df, x = 'Gender', y = 'Score', hue = 'Teaching Method', ci = None)
plt.title('Average student scores for each gender by method of teaching')

### Q8: Is there a favorite teacher of a particular ethnic?

In [77]:
# Ms.Ruger >> African-American
# Ms.Smith >> Caucasian teach Spanish.
# Ms.Wesson >>  Caucasian.
sns.barplot(data = df, x = 'Ethnic' , y = 'Score', hue = 'Teacher', ci = None)
plt.title('Average student scores for each ethnic by Teacher')

### Q9: Do students who get free lunch get higher grades than others?

In [78]:
df.groupby('Freeredu')['Score'].mean().plot(kind = 'bar', title = 'Average student score by lunch price' )

# **Conclusions**

1. The percentage of male students is slightly higher than females, as they constitute 55 percent of the student population.
1. Hispanics make up the highest number of students while other ethnic groups have close numbers.
1. Males and females score similarly on average.
1. The lunch price does not affect the student's score.
1. Females get higher average scores than males in the Caucasian ethnic.
1. The traditional teaching method shows slightly better results than the other method for all students(taking into account gender and ethnicity).
1. Ms.Smith gives slightly better results with Caucasian and Hispanic students.
1. Ms.Ruger leads to the same results with African-Americans or with others.
1. Ms.Wesson gives the best results with all students.

In the end I recommend:
1.  Continuing to experiment between the traditional method and the standards-based method because the results are close so far, even if the traditional method showed that it is a little better. Despite that, we cannot be certain that it is the best.
1.  Ms.Wesson's theory that it is better to teach racially or socially compatible teachers with students is wrong and there must be other criteria for teacher preference, such as a background in mathematics.

# Limitations

1. The number of data is so small that we cannot be certain that our findings are accurate or reliable.
1. Other dimensions were not considered in the data, such as social environment, student intelligence, learning environment, or geographic area.