## Exploring Shelter Data
I'm taking it upon myself to learn a bit about data science. 

Since I have a lot of interest in homeless services in Toronto (see [ChalmersCards](chalmerscards.com)), I felt Toronto's [public shelter occupancy dataset](https://www.toronto.ca/ext/open_data/catalog/data_set_files/SMIS_Daily_Occupancy_2017.csv) would make a good learning environment.

---
#### My current understanding of the application of datascience:
Data Science: The task of generating insights.

Actionable insights are determined by understanding the values of the client/subject, and using those values to define an appropriate research question. 

1. A research **question** is used to define the research **method**. 
2. Executing the **method** will render a **result and insights**. 
3. From the **insight**, **actions** can be determined. 

---
#### Understanding the client
One of shelters' greatest concerns is capacity. Other than shelter names and propteties, it's the only metric tracked in this database.

Maybe my question could be:
**Question** Which shelters are most susceptible to hitting capacity?
or
**Question** During what time of the year is Shelter occupancy highest?
or
**Question** Which shelter type (male/femeale/mix/family) has the highest occupancy?

> I'll tackle these items one at a time

# Which shelters are most suseptible to hitting capacity?

+ Question : Which shelters are most susceptible to hitting capacity?
+ Method : For each shelter, collect capacity numbers for the year. Find which shelters spend the most time with their occupancy close to their capacity

In [7]:
import pandas as pd

In [8]:
import numpy as np

In [9]:
import matplotlib.pyplot as plt

## Importing Libraries
I don't really know what I'm doing, so I'll grab pandas, it's dependancy numPy, and a visualization library matPlotLib. 

In [11]:
data = pd.read_csv('data/shelter_Occupancy_2017.csv')
print(data.columns)

Index(['OCCUPANCY_DATE', 'ORGANIZATION_NAME', 'SHELTER_NAME',
       'SHELTER_ADDRESS', 'SHELTER_CITY', 'SHELTER_PROVINCE',
       'SHELTER_POSTAL_CODE', 'FACILITY_NAME', 'PROGRAM_NAME', 'SECTOR',
       'OCCUPANCY', 'CAPACITY'],
      dtype='object')


## Importing dataset
I've grabbed the shelter_occupancy_2017.csv dataset from Toronto's open data portal. I've also printed all the columns for reference.

Now what I need to do is grab all the shelters occupancy/capacity numbers and find which shelters were always close to capacity

## Defining the method
This may be an unfair assumption, but I'll make it anyway:

---
> A shelter's facilities will be built around it's capacity. Therefore, what matters most to the shelter's performance is the proportion of free beds, and not the number of free beds. 

For example, using the above assumption, we would infer that:
> Shelter 'A' that is at 68/69 beds capacity

is in worse shape than 
> Shelter 'B' that is at 9/10 beds capacity

because shelter 'A' has 0.014% capacity left, while shelter 'B' has 10% capacity left

---
What I'll try to find out:
1. find out: What are the top 90th percentile shelters for highest capacity percentage
2. find out: of that top 10%, which 3 shelters most often hit 100% capacity

In [12]:
print(data.iloc[0])
print(data.iloc[1])

OCCUPANCY_DATE                               01/01/2017
ORGANIZATION_NAME              COSTI Immigrant Services
SHELTER_NAME                     COSTI Reception Centre
SHELTER_ADDRESS                   100 Lippincott Street
SHELTER_CITY                                    Toronto
SHELTER_PROVINCE                                     ON
SHELTER_POSTAL_CODE                             M5S 2P1
FACILITY_NAME                    COSTI Reception Centre
PROGRAM_NAME           COSTI Reception Ctr CITY Program
SECTOR                                            Co-ed
OCCUPANCY                                            16
CAPACITY                                             16
Name: 0, dtype: object
OCCUPANCY_DATE                                         01/01/2017
ORGANIZATION_NAME         Christie Ossington Neighbourhood Centre
SHELTER_NAME                      Christie Ossington Men's Hostel
SHELTER_ADDRESS                              973 Lansdowne Avenue
SHELTER_CITY                             

## Index of relvant values
+ Date collected (Occupancy_DATE) [i,0]
+ Shelter Name is at index [i,2]
+ Shelter address at [1,3]
+ Shelter Occupancy at [i,10]
+ Shelter Capacity at [i,11]


In [14]:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['One', 'Orange', 'C', 'D'])
df

Unnamed: 0,One,Orange,C,D
2000-01-01,1.453336,0.591906,1.444163,0.829423
2000-01-02,0.85591,-0.196669,-0.059823,-0.468265
2000-01-03,-0.326278,0.372777,-0.091989,0.83423
2000-01-04,0.557051,-0.050783,-2.234943,0.7463
2000-01-05,-1.232732,1.29835,-1.547637,-1.55951
2000-01-06,1.351569,-0.024005,0.032072,-1.544221
2000-01-07,-0.635505,0.9301,-1.560666,0.32239
2000-01-08,-2.039134,-0.146208,1.021202,-0.019265


In [18]:
panel = pd.Panel.to_frame({'one' : df, 'two' : df - df.mean()})
panel

AttributeError: 'dict' object has no attribute 'shape'