## Drill: Formulating good research questions

*Categorize each of the following research questions as **"good"** or **"bad"**, and provide alternative formulations for the bad ones. *

Check these points for each question: 
    - What is already known? 
    - What sort of data/ways to collect data are available to me on this topic? 
    - What skills (stats & prog) do I have given the time limit? 
    - Can the RQ be answered in terms of quantities or probabilities? 
    - Can this RQ be asked in one sentence? 

**1. What is the 1994 rate of juvenile delinquency in the U.S.?**

- This question is not interesting, and there are available reports and summary statistics on this topic (see the list below). According to these reports, there were about 1,555,200 cases of juvenile delinquency in the US in 1994. This was a 41% increase in a decade (1985-94), 20% increase for (1990-94) and a 5% increase from the previous year (1993-94). 

- More information is available on this topic in the resources below:  
    1. Statistical report: [Delinquency Cases in Juvenile Courts, 1994](https://www.ncjrs.gov/pdffiles/delc94.pdf); doi: https://doi.org/10.3886/ICPSR06882.v1
    2. Data: [Juvenile Court Statistics, 1994: [United States\] (ICPSR 6882)](https://www.icpsr.umich.edu/icpsrweb/NACJD/studies/6882#)
    3. [Juveniles Prosecuted in State Criminal courts](https://www.bjs.gov/content/pub/ascii/JPSCC.TXT) 
    4. [Office of Juvenile Justice and Delinquency Prevention (OJJDP)](https://www.ojjdp.gov/)
    
- Data ("Juvenile Court Statistics 1994") is not publicly available, however it is available from OJJDP’s Juvenile Justice Clearinghouse on request by calling 800–638–8736. 

- From the reports, it seems that the data contain survey questions and responses. Previous reports have shown that it is possible to report the data in quantities. 

- Yes, and the question is as stated in this question. 


**2. What can we do to reduce juvenile delinquency in the U.S.?**

- Bad, although only from the perspective of a Data Scientist. 

- Available resources [Juvenile Delinquency Prevention] (https://www.impactlaw.com/criminal-law/juvenile/prevention) list several factors, such as, education, recreation, community involvement, bullying prevention, parent-child interaction etc. as effective components of juvenile delinquency prevention.

- These measures are however subject to long-term qualitative research methods and can not be directly addressed by immediate quantitative measures given a limited amount of time.  

- An alternative question for this would be to pick any above-mentioned components and ask if that component has an effect on reducing juvenile delinquency. For example, as stated in the next question, if education reduces juvenile delinquency by increasing awareness and civic value. Data for this study could be found from long-term research studies done on this topic. We can then curate the data to fit our needs to answer the question. 


**3. Does education play a role in reducing juvenile delinquents' return to crime?**

- This is a more specific research question, just as mentioned in the previous question, and could be answered in quantities. 

- A great deal of scientific research has been published on this topic. However, gathering data on this topic could be an issue as most of this research has been done by academic institutions. Data could be available on request from the contributors in the workshop on Education and Delinquency: Panel on Juvenile Crime. See: [National Research Council. 2000. Education and Delinquency: Summary of a Workshop.](https://doi.org/10.17226/9972).

- Given simple statistics and research skills, this question could be answered in quantitatve summaries. 


**4. How many customers does AT&T currently serve in Washington, DC?**

- This is not an interesting Data Science question as information could be found [here](https://engage.att.com/dc/). As per this page, and as of Aug 2017, At&T covers 100% of residents of Washington, DC. However, there is no data available online to justify this claim.


**5. What factors lead consumers to choose AT&T over other service providers?**

- It is a better question, relevant to the perspectives of a Data Scientist. 

- According to a [CIRP Study](https://www.huffingtonpost.com/michael-r-levin/why-do-consumers-switch-m_b_6525492.html), AT&T consumers leave switch carriers primarily because of the cost of their servince. The secondary metric is the strucutre of their plans. 

- Data for the CIRP analysis come from surveys based on 2000 US subjects who activated a new or used mobile phone in the 90 days preceding four quarterly surveys covering the period October 2013-September 2014. 

- If CIRP data is not avialable, researchers may create customized exit surveys for leaving customers to investigate the reason for leaving the carrier. Analyzing such exit surveys would inform us about the factors affecting existing customers. 


**6. How can AT&T attract more customers?**

- This is a good research question for Data Scientists working on User Experience roles. 

- According to the above-mentioned CIRP study, customers are primarily loyal to AT&T due to their network quality, even though the cost is a pain point. Based on previous research, the research design should focus on promoting the network quality and reducing cost. A researcher could run A/B testing on customized offers to investigate if the factors are useful. 

- Given the skills in A/B testing and statistics, a researcher would be able to run the experiments, and report in terms of quantitatve metrics.  


**7. Why did the Challenger Shuttle explode?**

- Good question. 

- Existing report [Draper, 1993](http://rexa.info/paper/150b816894ae1bf07f15679b3e57b5ad9b47b392) suggests that faulty propulsion system in the shuttle caused the explosion. According to the report, the explosion was  traced to the failure of one of the three field joints on one of the two solid booster rockets. Each of these six field joints includes two O-rings, designated as primary and secondary, which fail when phenomena called erosion and blowby both occur. 

- Data is available [here](https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring) and it includes all the relevant variables involving the report discussed above. 

- Machine Learning skills are necessary for analyzing this dataset. 



**8. Which genes are associated with increased risk of breast cancer?**

- Good research question for a data scientist working with a bioinformatics specialization. A lot of [research](https://www.breastcancer.org/risk/factors/genetics), and [resources for data](https://www.nature.com/articles/npjbcancer201631) are available on this topic.  



**9. Is it better to read to children at night or in the morning?**

- Bad question as it is not defined what "better" means. If the question is more specific (e.g., "Does reading at night results to better memory in children?") then there is a possibility to answer the question quantitatively. However, given a limited amount of time, answering the question will not be possible as it requires a longitudinal study to measure the effects.  


**10. How does Google’s search algorithm work?**

- This is potentially a bad question as the answer to this question will be an explanation of the Page Rank algorithm. There are available [resources](https://computer.howstuffworks.com/internet/basics/google1.htm) that explains the algorithm. 


