# Data Analysis

## Data Analysis processes include:
    1. Posing a question
    2. Wrangling data into a format you can use and fixing any problems with it
    3. Exploring the data, finding patterns in it, and building your intuition about it.
    4. Drawing conculsions and/or making predictions.
    5. Communicating your findings

### Examples of Python Data Analysis Libraries
    Numpy
    Pandas
    Matplotlib

## Question Phase
Examples
- Characteristics of students who pass projects
- How to stock a store

## Wrangling Phase
    1. Data acquisition
    2. Data cleaning
  To learn more about data wrangling, checkout this free udacity course
  https://www.udacity.com/course/data-wrangling-with-mongodb--ud032?_ga=1.79543568.587032029.1458426814

### Data Acquisition Methods

- Downloading files
- Accessing an API 
- Scrapping a webpage
- Combine data from different formats

### Data Format
<b>CSV</b> - Comma Separated Values
     - Like a spreadsheet with no formulas
     - Easy to process with code (unlike .xlsx e.g. Microsoft Excel)
  Python's documentation for CSV file reading and writing can be found at
  https://docs.python.org/2/library/csv.html


In [4]:
# For this project I'm analyzing .csv files
# with random selection of students who had 
# completed a project at the time the data was collected

In [3]:
## In Python the content of a csv file are commonly
## represented as a list of rows

# option 1: Each row is a list
csv = [['A1', 'A2', 'A3'],
       ['B1', 'B2', 'B3']]

# option 2: Each row is a dictionary
# This option works best if your file has a header
# Thus the keys can be the column names while 
# the fields can be values.
csv = [{'name1': 'A1', 'name2': 'A2', 'name3': 'A3'},
      {'name1': 'B1', 'name2': 'B2', 'name3': 'B3'}]

In [23]:
import unicodecsv

with open("enrollments.csv", 'rb') as f:
    reader = unicodecsv.DictReader(f)
    enrollments = list(reader)
    
    
enrollments[0]


{u'account_key': u'448',
 u'cancel_date': u'2015-01-14',
 u'days_to_cancel': u'65',
 u'is_canceled': u'True',
 u'is_udacity': u'True',
 u'join_date': u'2014-11-10',
 u'status': u'canceled'}

In [1]:
# create a function to read csv files
import unicodecsv

def read_csv(filename):
    with open(filename, 'rb') as f:
        # read data in dictionary format
        reader = unicodecsv.DictReader(f)
        # return data as a list of dictionary values
        return list(reader)
    

daily_engagement = read_csv('daily_engagement.csv')
print daily_engagement[0]

#enrollments = read_csv('enrollments.csv')
#print enrollments[0]
#project_submissions = read_csv('project_submissions.csv') 
#print project_submissions[1] 

{u'lessons_completed': u'0.0', u'num_courses_visited': u'1.0', u'total_minutes_visited': u'11.6793745', u'projects_completed': u'0.0', u'acct': u'0', u'utc_date': u'2015-01-09'}


In [39]:
import unicodecsv

enrollments_filename = 'enrollments.csv'

## Longer version of code (replaced with shorter, equivalent version below)

# enrollments = []
# f = open(enrollments_filename, 'rb')
# reader = unicodecsv.DictReader(f)
# for row in reader:
#     enrollments.append(row)
# f.close()

with open(enrollments_filename, 'rb') as f:
    reader = unicodecsv.DictReader(f)
    enrollments = list(reader)

print enrollments[0]
print

### Write code similar to the above to load the engagement
### and submission data. The data is stored in files with
### the given filenames. Then print the first row of each
### table to make sure that your code works. You can use the
### "Test Run" button to see the output of your code.

engagement_filename = 'daily_engagement.csv'
submissions_filename = 'project_submissions.csv'

def read_csv(filename):
    with open(filename, 'rb') as f:
        reader = unicodecsv.DictReader(f)
        return list(reader)


daily_engagement = read_csv(engagement_filename)
print daily_engagement[0]
print
project_submissions = read_csv(submissions_filename)
print project_submissions[1]




{u'status': u'canceled', u'is_udacity': u'True', u'is_canceled': u'True', u'join_date': u'2014-11-10', u'account_key': u'448', u'cancel_date': u'2015-01-14', u'days_to_cancel': u'65'}

{u'lessons_completed': u'0.0', u'num_courses_visited': u'1.0', u'total_minutes_visited': u'11.6793745', u'projects_completed': u'0.0', u'acct': u'0', u'utc_date': u'2015-01-09'}

{u'lesson_key': u'3176718735', u'processing_state': u'EVALUATED', u'account_key': u'256', u'assigned_rating': u'INCOMPLETE', u'completion_date': u'2015-01-13', u'creation_date': u'2015-01-10'}


## More Questions 
- How long it takes to submit project
- How do students who pass their projects differ from those who don't?
- How much time students spend taking classes
- How time spent relates to lessons/projects completed
- How engagement changes 

 These are all valid questions to ask and solve.