# <center> Dog Pupularity Patterns</center>

<center>By: Sydney Smith, Jared Mohammed,
 Ally Cronander, Marilyn Lalrindiki
</center>


---

<img src="https://s.abcnews.com/images/Health/puppies-01-stock-gty-jef-180920_hpMain_16x9_1600.jpg" width=500px>

## Introduction

Dog parks’ popularity varies depending on the day and time. Ingham County dog parks operate under a locked gate and key fob system to ensure all dogs entering the park are registered and vaccinated. Data from these gates has been recorded, including who comes in and who leaves at what times.This data can be used to model the popularity of the park.

The focal points that are being targeted are as follows:
* What are the overall scans per year?

* What is the most popular day to visit the dog park?

* The times that everyone punches in

## Coding

In [2]:
#Imports
#imports
import math 
import numpy as np
import matplotlib.pyplot as plt

#makes matplotlib (and pandas) plots show up in the notebook
%matplotlib inline

#the following two lines make all plots show up in formats that are high-resolution
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')

#actually import the Pandas module
import pandas as pd

In [4]:
dog_park = pd.read_csv('dogparkdata_2012to2018.csv')
dog_park

Unnamed: 0.1,Unnamed: 0,names,times,valid
0,0,bdjdaabeacjjbibbafjcjdjfaibib,1/1/2012 8:31:46AM,Valid Access
1,1,abjejijgjejjbiagjebib,1/1/2012 10:27:20AM,Valid Access
2,2,abjejijgjejjbiagjebib,1/1/2012 10:52:26AM,Valid Access
3,3,jeaajfaijijdjjbijdjcbdjcbbbbaabib,1/1/2012 10:52:48AM,Valid Access
4,4,jeaajfaijijdjjbijdjcbdjcbbbbaabib,1/1/2012 11:19:56AM,Valid Access
5,5,acjhbbahjhbeabjibejjbiahjdjhabjejhjcbib,1/1/2012 11:20:13AM,Valid Access
6,6,acjhbbahjhbeabjibejjbiahjdjhabjejhjcbib,1/1/2012 11:35:37AM,Valid Access
7,7,acjijbbejhbeaejjbiabaabcaabebeaaafbib,1/1/2012 12:48:51PM,Valid Access
8,8,acjijbbejhbeaejjbiabaabcaabebeaaafbib,1/1/2012 1:22:28PM,Valid Access
9,9,afjcjdadjijdacjjbiagjcaabebibgbh,1/1/2012 1:51:26PM,Valid Access


In [5]:
# make the data more readable
# make dataframe into an array & split based on year
dog_park_array = np.array(dog_park)
data_2012 = dog_park_array[0:10891]
data_2013 = dog_park_array[10891:20945]
data_2014 = dog_park_array[20945: 32605]
data_2015 = dog_park_array[32605: 49340]
data_2016 = dog_park_array[49340:70150]
data_2017 = dog_park_array[70150:87560]
data_2018 = dog_park_array[87560:110066]

In [6]:
# deal with the yucky date/time string
# this function will create an array with the correct times in military time in the format [hour, minute, second]
def correct_time(array):
    correct_times = []
    for i in range(0,len(array)):
        date_time = array[i][2] 
        time = date_time[-10:-2]
        am_pm = date_time[-2]
        minute = int(time[3:5])
        second = int(time[6:8])
        hour = int(time[0:2]) 
        if am_pm == 'A' and hour != 12:
            correct_times.append([hour, minute,second])
        elif am_pm == 'P'and hour != 12:
            hour += 12
            correct_times.append([hour, minute, second])
        elif am_pm == 'A' and hour == 12:
            hour = 0
            correct_times.append([hour, minute, second])
        elif am_pm == 'P' and hour == 12:
            hour = 12
            correct_times.append([hour,minute,second])
    return(correct_times)    

In [7]:
# dealing with the yucky date/time string continued
# this function will create an array with the correct date in the format [month, day, year]
def correct_date(array):
    correct_dates = []
    for i in range(0, len(array)):
        date_time = array[i][2]
        date = date_time[0:10]
        find_month = date.partition('/')
        month = int(find_month[0])
        find_day = find_month[2].partition('/')
        day = int(find_day[0])
        year = int(find_day[2])
        correct_dates.append([month,day,year])
    return(correct_dates)

In [8]:
times_2012 = correct_time(data_2012)
dates_2012 = correct_date(data_2012)
times_2013 = correct_time(data_2013)
dates_2013 = correct_date(data_2013)
times_2014 = correct_time(data_2014)
dates_2014 = correct_date(data_2014)
times_2015 = correct_time(data_2015)
dates_2015 = correct_date(data_2015)
times_2016 = correct_time(data_2016)
dates_2016 = correct_date(data_2016)
times_2017 = correct_time(data_2017)
dates_2017 = correct_date(data_2017)
times_2018 = correct_time(data_2018)
dates_2018 = correct_date(data_2018)

In [54]:
# now we need to get into the real stuff, finding the times that people were at the park
# first, we will define a function that determines what indices have the date we want

def find_indices(month,day,year,date_array):
    begin = date_array.index([month,day,year])
    end = date_array.index([month,day+1,year]) 
    return(begin,end)

#test!
find_indices(12,1,2013,dates_2013)
#this will return 2 numbers, the first one is the first entry of 2/14/2017, and the second one is the last entry

(9342, 9394)

In [144]:
find_indices(1,1,2013,dates_2013)

(0, 39)

In [145]:
find_indices(2,1,2013,dates_2013)

(849, 863)

In [146]:
find_indices(3,1,2013,dates_2013)

(1734, 1762)

In [None]:
find_indices(12,1,2013,dates_2013)

In [55]:
#2013

#2013

jan2013 = dates_2013[0:848]
feb2013 = dates_2013[849:1733]
mar2013 = dates_2013[1734:2667]
apr2013 = dates_2013[2668:3426]
may2013 = data_2013[3427:4578]
june2013 = data_2013[4579:5792]
jul2013 = data_2013[5793:6468]
aug2013 = data_2013[6469:8183]
#sept2013 = data_2013[] This data does not exist 
oct2013 = data_2013[8184:8292]
nov2013 = data_2013[8293:9341]
dec2013 = data_2013[9342:]

In [56]:
find_indices(12,1,2015,dates_2015)

(15449, 15461)

In [53]:
#2015

jan2015 = dates_2015[0:1028]
feb2015 = dates_2015[1029:1561]
mar2015 = dates_2015[1562:2548]
apr2015 = dates_2015[2549:3946]
may2015 = data_2015[3947:5480]
june2015 = data_2015[5481:7084]
jul2015 = data_2015[7085:8545]
aug2015 = data_2015[8546:10349]
sept2015 = data_2015[10350:12125] 
oct2015 = data_2015[12126:13767]
nov2015 = data_2015[13768:15448]
dec2015 = data_2015[15449:]

In [136]:
find_indices(10,18,2017,dates_2017)

(14509, 14521)

In [137]:
find_indices(12,1,2017,dates_2017)

(15942, 16013)

In [139]:
#2017

jan2017 = dates_2017[0:1193]
feb2017 = dates_2017[1194:2797]
mar2017 = dates_2017[2798:4390]
apr2017 = dates_2017[4391:6400]
may2017 = data_2017[6401:8534]
june2017 = data_2017[8535:9952]
jul2017 = data_2017[9953:10351]
aug2017 = data_2017[10352:12634]
sept2017 = data_2017[12635:14508] 
oct2017 = data_2017[14509:14709]
nov2017 = data_2017[14710:15941]
dec2017 = data_2017[15942:]

In [143]:
plt.scatter(jan2017[:,0],feb2017[:,0])

TypeError: list indices must be integers or slices, not tuple

### Question: What is the best time to go to the park based on the date and time?

### Most popular days

### How popularity has changed through the times of the year

In [1]:
import calendar

In [12]:
c = calendar.TextCalendar(calendar.SUNDAY)

str = c.formatmonth(1998,12)
print(str)

   December 1998
Su Mo Tu We Th Fr Sa
       1  2  3  4  5
 6  7  8  9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31



In [16]:
#Seeing each data more clearly by saving it into its own file
with open("data.csv","w+") as out_file:
    
    for i in range(len(times_2017)): #times_2017 is defined in sydney's notebook
        
        out_string = ""
        out_string +="\n"
    out_file.write(out_string)


NameError: name 'times_2017' is not defined

In [None]:
!cat data.csv

In [1]:
# https://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-then-filling-it

In [3]:
import pandas as pd
newDF = pd.DataFrame() #creates a new dataframe that's empty
newDF = newDF.append(oldDF, ignore_index = True) # ignoring index is optional
# try printing some data from newDF
#print newDF.head() #again optional 

NameError: name 'oldDF' is not defined

---