# Replicate Andres's NAS vs ASER scatterplot
This notebook compares ASER 2018 data with NAS 2017 data.  ASER and NAS differ in both sampling and the assessment tool used. I provide a brief recap of ASER and NAS and then discuss what data is used for this comparison. The ASER Centre has put together a good comparison of the two surveys [here](https://img.asercentre.org/docs/Bottom%20Panel/Key%20Docs/nas_stdvvs_aser.pdf)

## ASER
1. Frequency
    1. Basic survey conducted every other year
2. Sampling
    1. Only rural areas covered
    2. In each rural district, 30 villages are selected using PPS.  In each village, 20 households are selected using the right-hand rule
    3. Survey is conducted at home rather than in school. Thus, results not affected by enrollment / attendance.
    4. All children 5-16 years old, regardless of school enrollment, are administered the assessment. (In addition, data on enrollment is collected for 3 and 4 year olds
3. Data collected
    1. Assessment tests basic reading and math.  
        a. Read levels are not even letters, letters, word, std 1 text, std 2 text
        b. Math levels are not even 1-9, 1-9 numbers, 10-99 numbers, substract, and divide
    2. Basic household details and school attendance (e.g. whether public or private) also collected
4. Other
    1. ASER is carried out through local partners who rely on volunteers
    2. There are some internal and third party checks, but these seem to be pretty minimal
    3. The assessment is administered one-on-one by the enumerators who read out the questions to the child being assessed.
    4. ASER uses weights when calculating state and national figures


## NAS
NCERT has been conducting NAS since 2001 but in previous years they only collected data for a single grade at a time. (For example, they collected data for class 5 in academic year 2001-02, for class 8 in academic year 2002-03, and for class 3 in academic year 2003-04.  See [here](http://www.ncert.nic.in/programmes/NAS/pdf/DRC_report.pdf) page 6 for more info.) In addition, the sample sizes in previous years were smaller (they were only intended to be representative at the state level) and the assessment tool used was different. 


1. Frequency
    1. Previous versions of the NAS were carried out every year.  There has only been one instance in which NAS has been carried out in its current form (in 2017)
2. Sampling
    1. Schools are selected from UDISE using PPS (according to the ASER note on the NAS)
    2. Only government schools (and maybe private aided - not sure) included
    2. Grades 3, 5, and 8 covered
3. Data collected
    1. Test language, math, and environmental science
    2. Seems like they also collect basic data about the school
4. Other
    1. Paper and pencil test
    2. Estimates are not weighted (actual enrollment varied quite a bit from figures in USIDE so ideally should have used weights
    
## Comparing ASER and NAS
As per Andres, I attempt to make ASER and NAS as similar as possible by...
1. Restricting NAS to rural districts
2. Focusing only on students who attend government schools
3. 


In [4]:
import pandas as pd
import plotnine as p9
import os

In [6]:
data_dir = "/mnt/c/Users/dougj/Documents/Data/Education"
os.chdir(data_dir)

In [17]:
main_data <- pd.read_csv("ASER 2018 and NAS 2017 grade 3 reading.csv")
(ggplot(data = main_data,aes=aes(x=aser_rank,y=nas_rank,label= State)) + geom_abline(intercept = 0, slope = 1, color="orange")  +
  geom_abline(intercept = -6, slope = 1, color="gray", lwd=1, lty=2) +
  geom_abline(intercept = 6, slope = 1, color="gray", lwd=1, lty=2) +
  geom_point(color="darkblue")  +
  theme_bw() + labs(title="State Rankings Based on Language Results for Std III Students (Rural)", 
                    x = "Rank in ASER (2018)", y = "Rank in NAS (2017)") + 
   scale_y_continuous(breaks=range(1,29)) + scale_x_continuous(breaks=range(1,29))  + theme(panel.grid.minor =   element_blank(),
   panel.grid.major =   element_line(colour = "gray",size=0.1)) + geom_label_repel() )


SyntaxError: keyword can't be an expression (<ipython-input-17-a7cc17115fea>, line 8)

Object `ggplot` not found.
