# Preparation for Facebook Technical Questions

Here are my notes for preparing for the technical portion of the interview. To simulate interview conditions, I wrote all code in markdown mode only converting the cells to executable after I felt comfortable they were correct.

## Notes

<a id='pandas-dates'></a> [**Pandas - Time Series / Date Functionality**](http://pandas.pydata.org/pandas-docs/version/0.23/timeseries.html)

* Uses the numpy ```datetime64``` and ```timedelta64``` datatypes

<table border="1" class="docutils">
<colgroup>
<col width="15%">
<col width="27%">
<col width="58%">
</colgroup>

<thead valign="bottom">
<tr class="row-odd"><th class="head">Class</th>
<th class="head">Remarks</th>
<th class="head">How to create</th>
</tr>
</thead><tbody valign="top">
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">Timestamp</span></code></td>
<td>Represents a single timestamp</td>
<td><code class="docutils literal notranslate"><span class="pre">to_datetime</span></code>, <code class="docutils literal notranslate"><span class="pre">Timestamp</span></code></td>
</tr>
<tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">DatetimeIndex</span></code></td>
<td>Index of <code class="docutils literal notranslate"><span class="pre">Timestamp</span></code></td>
<td><code class="docutils literal notranslate"><span class="pre">to_datetime</span></code>, <code class="docutils literal notranslate"><span class="pre">date_range</span></code>, <code class="docutils literal notranslate"><span class="pre">bdate_range</span></code>, <code class="docutils literal notranslate"><span class="pre">DatetimeIndex</span></code></td>
</tr>
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">Period</span></code></td>
<td>Represents a single time span</td>
<td><code class="docutils literal notranslate"><span class="pre">Period</span></code></td>
</tr>
<tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PeriodIndex</span></code></td>
<td>Index of <code class="docutils literal notranslate"><span class="pre">Period</span></code></td>
<td><code class="docutils literal notranslate"><span class="pre">period_range</span></code>, <code class="docutils literal notranslate"><span class="pre">PeriodIndex</span></code></td>
</tr>
</tbody>
</table>

* Both ```Timestamp``` and  ```Period``` objects can serve as an index. They are automatically cast into ```DatetimeIndex``` and ```PeriodIndex``` objects.

In [1]:
import pandas as pd
# Can convert from strings to date-like objects via pd.to_datetime
pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))

  return f(*args, **kwds)


0   2009-07-31
1   2010-01-10
2          NaT
dtype: datetime64[ns]

In [2]:
# Make range of dates:
pd.date_range('2010-02-20', '2011-03-05')

DatetimeIndex(['2010-02-20', '2010-02-21', '2010-02-22', '2010-02-23',
               '2010-02-24', '2010-02-25', '2010-02-26', '2010-02-27',
               '2010-02-28', '2010-03-01',
               ...
               '2011-02-24', '2011-02-25', '2011-02-26', '2011-02-27',
               '2011-02-28', '2011-03-01', '2011-03-02', '2011-03-03',
               '2011-03-04', '2011-03-05'],
              dtype='datetime64[ns]', length=379, freq='D')

## Exercises

### Mock Question from E-mail

An attendance log for every student in a school district ```attendance_events```:

| date | student_id | attendance |
|:----:|:----------:|:----------:|
|      |            |            |

A summary table with demographics for each student in the district ```all_students```: 

|student_id | school_id | grade_level | date_of_birth | hometown |
|-----------|-----------|-------------|---------------|----------|

Using this data, you could answer questions like the following:

* What percent of students attend school on their birthday?
* Which grade level had the largest drop in attendance between yesterday and today?

In [6]:
import pandas as pd
import numpy as np

# Functions used to generate mock data simulating the tables given above.

n_students = 1000
n_days = 10
start_date = '2017-09-01'
end_date = '2018-06-15'

################################################################################
# Attendance Table                                                             #
################################################################################
def _make_event_dates(n_students, start_date, end_date):
    dr = pd.date_range(start_date, end_date)
    dates = []
    for day in dr:
        dates.extend([day] * n_students)
    return dates

def _make_student_ids(n_students, n_days=None):
    student_ids = [xx for xx in range(100, 100 + n_students)]
    if n_days is not None:
        student_ids = student_ids * n_days
    return student_ids

def _make_attendance(n_students, n_days):
    return list(np.random.choice(2, n_students, p=[0.3, 0.7])) * n_days

def build_attendance_events(n_students, start_date, end_date):
    columns = ['date', 'student_id', 'attendance']
    n_days = len(pd.date_range(start_date, end_date))
    
    dates       = _make_event_dates(n_students, start_date, end_date)
    student_ids = _make_student_ids(n_students, n_days)
    attendance  = _make_attendance(n_students, n_days)
    data = [xx for xx in zip(dates, student_ids, attendance)]
    
    df = pd.DataFrame(data=data, columns=columns)
    return df

################################################################################
# District All Students Table                                                  #
################################################################################
def _make_school_ids(n_students):
    schools = ['South River High School',
               'New Brunswick High School',
               'East Brunswick High School',
               'Edison High School']
    return list(np.random.choice(schools, n_students))

def _make_grade_levels(n_students):
    grades = ['Freshman', 'Sophomore', 'Junior', 'Senior']
    return list(np.random.choice(grades, n_students))
    
def _make_DOBs(grade_levels):
    birth_years = {
        'Freshman': 2005,
        'Sophomore': 2004,
        'Junior': 2003,
        'Senior': 2002
    }
    years = [birth_years[xx] for xx in grade_levels]
    months = list(np.random.choice(np.arange(1,13), len(grade_levels)))
    days = list(np.random.choice(np.arange(1,29), len(grade_levels)))
    DOBs = pd.to_datetime(['{}-{}-{}'.format(*dd) for dd in zip(months, days, years)])
    return DOBs

def _make_hometowns(school_ids):
    hometowns = [school.split('High School')[0].strip() for school in school_ids]
    return hometowns
    

def build_all_students(n_students):
    student_ids  = _make_student_ids(n_students)
    school_ids   = _make_school_ids(n_students)
    grade_levels = _make_grade_levels(n_students)
    DOBs         = _make_DOBs(grade_levels)
    hometowns    = _make_hometowns(school_ids)
    
    columns = ['student_id', 'school_id', 'grade_level', 'date_of_birth', 'hometown']
    data = [xx for xx in zip(student_ids, school_ids, grade_levels, DOBs, hometowns)]
    df = pd.DataFrame(data=data, columns=columns)
    return df
    

attendance_events = build_attendance_events(n_students, start_date, end_date)
all_students = build_all_students(n_students=n_students)