# Data!
## Here we import our data and transform it to be clean data frames or geodataframes for mapping. Then we do a light examination of it and remove nulls (geodata cannot have any NANs or nulls). 

## There are further data manipulations performed in the other notebooks since each python package may need different formats to read the data.

In [1]:
import json
import requests
import pandas as pd
import numpy as np
import geopandas as gpd

## The next few cells are examples of how to find out what directory you are in (useful for importing data and saving data), and what python you are working with.


### Some people like to set their path for loading data.  Here you can find out what directory you are in

### What version do you have?  Geospatial libraries are all heavily dependent on each other's versions


###  Find your path for the Python libraries


## Starting with dataframe creation by reading in csv's
**************

### I created this list of essential pizza destinations in New York City in Google Maps. There is a long backstory to it.  

### To sum it up though, Google Maps does not give you coordinates but only url links.  So I had to create the Latitude/Longitude coordinates from the exported Google Maps list.  Please refer to the _"Google_Maps_API.ipynb"_ tutorial notebook for how to get coordinates from Google Maps.

In [15]:
dfNY = pd.read_csv("PizzaEssentials.csv")
dfNY.shape

(23, 5)

In [16]:
dfNY.head()

Unnamed: 0,Longitude,Latitude,Name,Comment,URL
0,-73.99324,40.70258,Grimaldi's Pizzeria,also coal fired. A descendant of Lombardi's pi...,https://www.google.com/maps/place/Grimaldi's+P...
1,-73.93488,40.79715,Patsy's Pizzeria,an OG joint in East Harlem. prolly the 4 pizze...,https://www.google.com/maps/place/Patsy's+Pizz...
2,-73.9813,40.59474,L&B Spumoni Gardens,"since 1939, the real deal ice cream of Italy.....",https://www.google.com/maps/place/L%26B+Spumon...
3,-73.99459,40.72301,Prince St. Pizza,NOLITA!!! neighborhood slice joint. reimaginin...,https://www.google.com/maps/place/Prince+St.+P...
4,-73.88825,40.85421,Mario's Restaurant,Old style classic. Pizza is ordered Off-Menu!...,https://www.google.com/maps/place/Mario's/data...


# DOHMH New York City Restaurant Inspection Results
This dataset provides restaurant inspections, violations, grades and adjudication information. It can be found at the New York City's official mapping site. Who knew they had one, but they do!  This is a very useful site to get lots of data. They also have shapefiles if you prefer.

https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j
    

In [5]:
rest = pd.read_csv("DOHMH_New_York_City_Restaurant_Inspection_Results.csv")

In [6]:
# LOOK AT THE SIZE OF THE DATA
rest.shape

(402052, 26)

In [7]:
# LIST THE HEADERS
list(rest)

['CAMIS',
 'DBA',
 'BORO',
 'BUILDING',
 'STREET',
 'ZIPCODE',
 'PHONE',
 'CUISINE DESCRIPTION',
 'INSPECTION DATE',
 'ACTION',
 'VIOLATION CODE',
 'VIOLATION DESCRIPTION',
 'CRITICAL FLAG',
 'SCORE',
 'GRADE',
 'GRADE DATE',
 'RECORD DATE',
 'INSPECTION TYPE',
 'Latitude',
 'Longitude',
 'Community Board',
 'Council District',
 'Census Tract',
 'BIN',
 'BBL',
 'NTA']

In [8]:
# REMOVE COLUMNS WE WILL NOT NEED
rest = rest.drop(['CAMIS',
 'PHONE',
 'INSPECTION DATE',
 'ACTION',
 'GRADE DATE',
 'RECORD DATE',
 'INSPECTION TYPE',
 'BIN',
 'BBL',
 'NTA'], axis=1) 

### We need to make sure we have no nulls or NaN's in the data. And we also remove any 0 values for Lat and Long since we know those can't be correct locations!

In [9]:
# remove null rows in Longitude field AND PRINT TO SEE IF THERE ARE ANY REMAINING
rest = rest.dropna(subset = ['Longitude'])
sum(pd.isnull(rest['Longitude']))

0

In [10]:
# WE GOT RID OF ABOUT 1000 NULLS. 
rest.shape

(401630, 16)

In [11]:
# There are many imput errors here where the lat and long are 0.0.  Let's remove them
rest = rest[rest.Longitude != 0]
rest = rest[rest.Latitude != 0]

In [12]:
rest = rest[rest.Longitude.notnull()]
rest = rest[rest.Latitude.notnull()]

# there is also the pd.query thing that nearly acts like sql
#  EX---  rest.query('Latitude != 0').

In [13]:
rest.shape

(396148, 16)

In [None]:
# rename DBA to Name. The "inplace=True" means you do not have to set the variable == to itself
rest.rename(columns = {'DBA':'Name'}, inplace=True)
rest.head()


#### NOW LETS REMOVE STATEN ISLAND SINCE IT IS NOT ON A SUBWAY LINE AND PIZZA ESSENTIALS ARE NOT IN THAT BORO


In [15]:
rest2 = rest.loc[rest['BORO'].isin(['Bronx', 'Manhattan', 'Brooklyn', 'Queens'])]
print(rest2['BORO'].unique())
#rest2['BORO'].nunique()        ### COUNTS UNIQUES IN COLUMN
#rest2['BORO'].value_counts()   ### TO COUNT TOTALS BY VALUE IN COLUMN
rest2.to_csv("NY_All_rest_No_Staten.csv")


['Bronx' 'Manhattan' 'Brooklyn' 'Queens']


#### Next, lets make a Subset where he cuisine is only decribed as pizza or italian


In [17]:
restPandI = rest2[(rest['CUISINE DESCRIPTION'] == 'Pizza') | (rest2['CUISINE DESCRIPTION'] == 'Italian')]
restPandI.shape
restPandI

  


Unnamed: 0,Name,BORO,BUILDING,STREET,ZIPCODE,CUISINE DESCRIPTION,VIOLATION CODE,VIOLATION DESCRIPTION,CRITICAL FLAG,SCORE,GRADE,Latitude,Longitude,Community Board,Council District,Census Tract
62,LITTLE CAESARS,Bronx,1054,SOUTHERN BOULEVARD,10459.0,Pizza,10F,Non-food contact surface improperly constructe...,N,7.0,A,40.824336,-73.891889,202.0,17.0,11900.0
70,NINO'S PIZZA,Brooklyn,9110,3 AVENUE,11209.0,Pizza,04L,Evidence of mice or live mice present in facil...,Y,47.0,,40.619722,-74.032595,310.0,43.0,5800.0
83,CAFE LORE,Brooklyn,4601,4 AVENUE,11220.0,Italian,04E,"Toxic chemical improperly labeled, stored or u...",Y,34.0,,40.648612,-74.010325,307.0,38.0,8000.0
97,RISTORANTE SETTEPANI,Manhattan,196,LENOX AVENUE,10026.0,Italian,10H,Proper sanitization not provided for utensil w...,N,11.0,A,40.804435,-73.947906,110.0,9.0,20000.0
104,BOTTEGA RESTAURANT,Manhattan,1331,2 AVENUE,10021.0,Italian,10F,Non-food contact surface improperly constructe...,N,22.0,,40.767632,-73.959227,108.0,4.0,12600.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
401983,PAPA JOHN'S,Brooklyn,529,STANLEY AVE,11207.0,Pizza,18G,Manufacture of frozen dessert not authorized o...,N,,,40.657246,-73.888801,305.0,42.0,110400.0
401984,CAPPONE'S,Manhattan,75,9TH AVE,10011.0,Italian,06C,Food not protected from potential source of co...,Y,19.0,,40.741869,-74.004713,104.0,3.0,8300.0
402023,DAGAN PIZZA,Brooklyn,6187,STRICKLAND AVENUE,11234.0,Pizza,08A,Facility not vermin proof. Harborage or condit...,N,13.0,A,40.613199,-73.912157,318.0,46.0,69800.0
402026,DOMINO'S,Bronx,315,E 204TH ST,10467.0,Pizza,08A,Facility not vermin proof. Harborage or condit...,N,12.0,A,40.872737,-73.878323,207.0,11.0,42500.0


In [18]:
restPandI.to_csv('restPandI.csv',encoding='utf-8')

### Lastly lets check only Pizza restaurants


In [19]:
restPizza = rest2[rest2['CUISINE DESCRIPTION'] == 'Pizza'] 
restPizza.shape


(16466, 16)

#### FIND THE UNIQUE RESTAURANTS TO REMOVE CHAINS


In [20]:
unique_arr = restPizza["Name"].unique()
len(unique_arr)

851

In [21]:
restPizza.to_csv('restPizza.csv',encoding='utf-8')

## Now Subway Entrances from NYC records
  https://data.cityofnewyork.us/Transportation/Subway-Entrances/drex-xx56
  
## or the Subway Station locations
  https://data.cityofnewyork.us/Transportation/Subway-Stations/arq3-7z49
 

In [55]:
subwayStation = pd.read_csv("DOITT_SUBWAY_STATION_01_13SEPT2010.csv")

In [56]:
subwayStation.shape

(473, 6)

In [57]:
subwayStation.head()

Unnamed: 0,URL,OBJECTID,NAME,the_geom,LINE,NOTES
0,http://web.mta.info/nyct/service/,1,Astor Pl,POINT (-73.99106999861966 40.73005400028978),4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s..."
1,http://web.mta.info/nyct/service/,2,Canal St,POINT (-74.00019299927328 40.71880300107709),4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s..."
2,http://web.mta.info/nyct/service/,3,50th St,POINT (-73.98384899986625 40.76172799961419),1-2,"1-all times, 2-nights"
3,http://web.mta.info/nyct/service/,4,Bergen St,POINT (-73.97499915116808 40.68086213682956),2-3-4,"4-nights, 3-all other times, 2-all times"
4,http://web.mta.info/nyct/service/,5,Pennsylvania Ave,POINT (-73.89488591154061 40.66471445143568),3-4,"4-nights, 3-all other times"


### Use regex to remove everything before and after '( )'

### After I convereted this geodataframe to a regular dataframe I learned that there is one line of code that can do the same thing.  But I kept this code here as an example for how to extract using regex, because _we all_ need more regex :)

In [58]:
subwayStation['coords']= subwayStation['the_geom'].str.extract('.*\((.*)\).*')
subwayStation.head()


Unnamed: 0,URL,OBJECTID,NAME,the_geom,LINE,NOTES,coords
0,http://web.mta.info/nyct/service/,1,Astor Pl,POINT (-73.99106999861966 40.73005400028978),4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-73.99106999861966 40.73005400028978
1,http://web.mta.info/nyct/service/,2,Canal St,POINT (-74.00019299927328 40.71880300107709),4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-74.00019299927328 40.71880300107709
2,http://web.mta.info/nyct/service/,3,50th St,POINT (-73.98384899986625 40.76172799961419),1-2,"1-all times, 2-nights",-73.98384899986625 40.76172799961419
3,http://web.mta.info/nyct/service/,4,Bergen St,POINT (-73.97499915116808 40.68086213682956),2-3-4,"4-nights, 3-all other times, 2-all times",-73.97499915116808 40.68086213682956
4,http://web.mta.info/nyct/service/,5,Pennsylvania Ave,POINT (-73.89488591154061 40.66471445143568),3-4,"4-nights, 3-all other times",-73.89488591154061 40.66471445143568


In [59]:
# Create two lists for the loop results to be placed
lat = []
lon = []

# For each row in a varible,
for row in subwayStation['coords']:
    # Try to,
    try:
        # Split the row by comma and append
        # everything before the comma to lon
        lon.append(row.split(' ')[0])
        # Split the row by comma and append
        # everything after the comma to at
        lat.append(row.split(' ')[1])
    # But if you get an error
    except:
        # append a missing value to lat
        lat.append(np.NaN)
        # append a missing value to lon
        lon.append(np.NaN)

# Create two new columns from lat and lon
subwayStation['Y'] = lat
subwayStation['X'] = lon

In [60]:
subwayStation.head()

Unnamed: 0,URL,OBJECTID,NAME,the_geom,LINE,NOTES,coords,Y,X
0,http://web.mta.info/nyct/service/,1,Astor Pl,POINT (-73.99106999861966 40.73005400028978),4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-73.99106999861966 40.73005400028978,40.73005400028978,-73.99106999861966
1,http://web.mta.info/nyct/service/,2,Canal St,POINT (-74.00019299927328 40.71880300107709),4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-74.00019299927328 40.71880300107709,40.71880300107709,-74.00019299927328
2,http://web.mta.info/nyct/service/,3,50th St,POINT (-73.98384899986625 40.76172799961419),1-2,"1-all times, 2-nights",-73.98384899986625 40.76172799961419,40.76172799961419,-73.98384899986625
3,http://web.mta.info/nyct/service/,4,Bergen St,POINT (-73.97499915116808 40.68086213682956),2-3-4,"4-nights, 3-all other times, 2-all times",-73.97499915116808 40.68086213682956,40.68086213682956,-73.97499915116808
4,http://web.mta.info/nyct/service/,5,Pennsylvania Ave,POINT (-73.89488591154061 40.66471445143568),3-4,"4-nights, 3-all other times",-73.89488591154061 40.66471445143568,40.66471445143568,-73.89488591154061


In [61]:
# remove the_geom column
#subwayStation = subwayStation.drop('the_geom', 1)

In [64]:
# rename X,Y columns to Long,Lat
subwayStation.rename(columns={'X': 'Longitude', 'Y': 'Latitude'}, inplace=True)
subwayStation.head()

Unnamed: 0,URL,OBJECTID,NAME,LINE,NOTES,coords,Latitude,Longitude
0,http://web.mta.info/nyct/service/,1,Astor Pl,4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-73.99106999861966 40.73005400028978,40.73005400028978,-73.99106999861966
1,http://web.mta.info/nyct/service/,2,Canal St,4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-74.00019299927328 40.71880300107709,40.71880300107709,-74.00019299927328
2,http://web.mta.info/nyct/service/,3,50th St,1-2,"1-all times, 2-nights",-73.98384899986625 40.76172799961419,40.76172799961419,-73.98384899986625
3,http://web.mta.info/nyct/service/,4,Bergen St,2-3-4,"4-nights, 3-all other times, 2-all times",-73.97499915116808 40.68086213682956,40.68086213682956,-73.97499915116808
4,http://web.mta.info/nyct/service/,5,Pennsylvania Ave,3-4,"4-nights, 3-all other times",-73.89488591154061 40.66471445143568,40.66471445143568,-73.89488591154061


In [123]:
subwayStation = subwayStation.drop('OBJECTID', axis=1) 
# adding the TYPE column to give us a way to join this data to the restaurant data
subwayStation['TYPE'] = 'subway'
subwayStation.head()

Unnamed: 0,URL,NAME,LINE,NOTES,coords,Latitude,Longitude,TYPE
0,http://web.mta.info/nyct/service/,Astor Pl,4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-73.99106999861966 40.73005400028978,40.73005400028978,-73.99106999861966,subway
1,http://web.mta.info/nyct/service/,Canal St,4-6-6 Express,"4 nights, 6-all times, 6 Express-weekdays AM s...",-74.00019299927328 40.71880300107709,40.71880300107709,-74.00019299927328,subway
2,http://web.mta.info/nyct/service/,50th St,1-2,"1-all times, 2-nights",-73.98384899986625 40.76172799961419,40.76172799961419,-73.98384899986625,subway
3,http://web.mta.info/nyct/service/,Bergen St,2-3-4,"4-nights, 3-all other times, 2-all times",-73.97499915116808 40.68086213682956,40.68086213682956,-73.97499915116808,subway
4,http://web.mta.info/nyct/service/,Pennsylvania Ave,3-4,"4-nights, 3-all other times",-73.89488591154061 40.66471445143568,40.66471445143568,-73.89488591154061,subway


In [124]:
# save csv for future use
subwayStation.to_csv('subwayStations.csv',encoding='utf-8')

# Now just for future fun, let's combine the dataframes into one that can be called for plotting the whole thing!

In [27]:
# drop unneeded columns
rest3 = rest2.drop([ 'VIOLATION CODE',
 'VIOLATION DESCRIPTION',
 'CRITICAL FLAG',
 'SCORE',
 'GRADE',
 'Community Board',
 'Council District',
 'Census Tract'], axis=1)

In [28]:
rest3

Unnamed: 0,Name,BORO,BUILDING,STREET,ZIPCODE,CUISINE DESCRIPTION,Latitude,Longitude
0,MONTEZUMA RESTAURANT,Bronx,119,WEST KINGSBRIDGE ROAD,10468.0,Mexican,40.868621,-73.902168
1,RESTAURANT RIKI,Manhattan,141,EAST 45 STREET,10017.0,Japanese,40.753164,-73.974161
2,WASABI RESTAURANT,Brooklyn,7222,18 AVENUE,11204.0,Japanese,40.614953,-73.994406
3,CHINA WOK,Brooklyn,5813,AVENUE T,11234.0,Chinese,40.615279,-73.918465
4,EDDY'S FRESH BREAKFAST & LUNCH,Queens,7509,WOODHAVEN BLVD,11385.0,American,40.708623,-73.859395
...,...,...,...,...,...,...,...,...
402047,ORTZI RESTAURANT AND BAR LOCATED INSIDE LOUMA ...,Manhattan,120,W 41ST ST,10036.0,Basque,40.754522,-73.985458
402048,AUNTIE ANNE'S/CINNABON,Queens,6135,JUNCTION BLVD,11374.0,Bakery,40.733428,-73.864170
402049,DESERT RAIN LOUNGE,Queens,10729,METROPOLITAN AVE,11375.0,American,40.709560,-73.845243
402050,THE ARCH DINER,Brooklyn,1866,RALPH AVENUE,11236.0,American,40.631196,-73.918521


In [32]:
# ADD THE TYPE COLUMN TO JOIN ON WITH THE SUBWAY DF
rest3['TYPE'] = 'restaurant'
rest3.head()
# save csv for future use
rest3.to_csv('NY_All_restaurants.csv',encoding='utf-8')

In [35]:

subway = pd.read_csv('subwayStations.csv',encoding='utf-8')
subway.rename(columns = {'NAME':'Name', 'NOTES':"Comments"}, inplace=True)
 
allNY = pd.concat([rest3, subway])
#allNY.columns
#allNY.shape
allNY.head()
allNY.to_csv("ALL_Restaurant_Subway_Locations.csv", encoding='utf-8')

In [36]:
list(allNY)
allNY

Unnamed: 0.1,Name,BORO,BUILDING,STREET,ZIPCODE,CUISINE DESCRIPTION,Latitude,Longitude,TYPE,Unnamed: 0,URL,LINE,Comments,coords
0,MONTEZUMA RESTAURANT,Bronx,119,WEST KINGSBRIDGE ROAD,10468.0,Mexican,40.868621,-73.902168,restaurant,,,,,
1,RESTAURANT RIKI,Manhattan,141,EAST 45 STREET,10017.0,Japanese,40.753164,-73.974161,restaurant,,,,,
2,WASABI RESTAURANT,Brooklyn,7222,18 AVENUE,11204.0,Japanese,40.614953,-73.994406,restaurant,,,,,
3,CHINA WOK,Brooklyn,5813,AVENUE T,11234.0,Chinese,40.615279,-73.918465,restaurant,,,,,
4,EDDY'S FRESH BREAKFAST & LUNCH,Queens,7509,WOODHAVEN BLVD,11385.0,American,40.708623,-73.859395,restaurant,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
468,Coney Island - Stillwell Av,,,,,,40.577281,-73.981236,subway,468.0,http://web.mta.info/nyct/service/,D-F-N-Q,"D,F,N,Q-all times",-73.9812359981396 40.57728100006751
469,34th St - Hudson Yards,,,,,,40.755446,-74.002197,subway,469.0,http://web.mta.info/nyct/service/,7-7 Express,"7-all times, 7 Express-rush hours AM westbound...",-74.00219709442206 40.75544635961596
470,72nd St,,,,,,40.768803,-73.958362,subway,470.0,http://web.mta.info/nyct/service/,Q,Q-all times,-73.95836178682246 40.76880251014895
471,86th St,,,,,,40.777861,-73.951771,subway,471.0,http://web.mta.info/nyct/service/,Q,Q-all times,-73.95177090964917 40.77786104333163


#          DONE!  Now we have cleaned data saved as csv's.
##                              Now go to next notebook and plot the data!

# ---------                     Actually   -------------------------
### Let's show some other necessary things that had to be done with the data to prepare it for geospatial analysis and visualization

### *To start:* make a geodataframe from a dataframe and clean it up and set EPSG projection

## For Geodataframe creation you can read in a .shp or a .kml file.  
## Or you can create the geodataframe from a dataframe by adding a geometry column, then setting the "crs" projection.

import geopandas as gpd
a = gpd.read_file("/path/to/your/doc.kml")


As a warning, this method is easy but sometimes you lose data (not all lines read in). 
Because it's so easy, I think it's at least worth a shot.

That code will load up your KML as a geopandas.GeoDataFrame; geopandas is a POWERFUL geospatial analysis library in Python. It eliminates your need for PostGIS; with a stack of geopandas, shapely (for geosptial ops), pysal (for advanced geospatial analytics), rasterio (raster analysis), datashader (visualize large datasets in milliseconds 
such as lidar) and fiona, you eliminate the need for ArcGIS or any other GIS tool.

In [None]:
df = pd.read_csv('PizzaEssentials.csv' ,encoding='utf-8')
df.head()

### IF YOU NEED TO ADD AN INDEX INT COLUMN TO REORDER THEM
#df1 = df.insert(0, 'New_ID', range(len(df)))
#df1


In [None]:
# Make dataframe of points for GDP to read in
EEE = pd.read_csv("PizzaTSP.csv")
# THESE UNNAMED INDEXES KEEP HAPPENING WOITH PANDAS IMPORTS. SO HERE WE REMOVE
EEE['Route'] = EEE['Unnamed: 0.1']
EEE['ID'] = EEE['Unnamed: 0']

#EE.drop('Unnamed: 0')
EEE


In [None]:
# MAKE THE GDF LESS CLUTTERED
EEE=EEE.drop(['Unnamed: 0.1', 'Unnamed: 0'],axis=1)

# REMOVE UNWANTED LOCATIONS THAT ARE IN THE SAME AREA AS EACH OTHER
EEE = EEE[EEE.Name != 'Prince Coffee House']
EEE = EEE[EEE.Name !=  "Mario's Restaurant"]
EEE = EEE[EEE.Name != "Cosenza's Fish Market"]
EEE = EEE[EEE.Name != 'Arthur Avenue Retail Market']
EEE = EEE[EEE.Name != 'Teitel Brothers Market']
EEE = EEE.reset_index(drop=True)

###  reassign the ID to be new order
EEE['ID'] = EEE.index  
EEE


In [None]:
import geopandas
import shapely
geometry = [Point(xy) for xy in zip(EEE['Longitude'], EEE['Latitude'])]

# Convert the count df to geodf
points = geopandas.GeoDataFrame(EEE, geometry=geometry)
# MUST SAVE A SHP FOR FUTURE USE AS A GDF
points.to_file("pizzaTSPgeopandas.shp")
points.head()
#type(points)


### TRANSFORM THE COORDS TO THE CRS SET ABOVE   # WGS84 is 4326   # Mercator is 3857
### You MUST first initialize a geodataframe initially with CRS = WGS84 (since that matches the long and lat co-ordinates). ONly then can you change the crs.
##  https://stackoverflow.com/questions/38961816/geopandas-set-crs-on-points


In [None]:

points.crs = {'init': 'epsg:4326'}
print(points.crs)
# project to merkator
points = points.to_crs("EPSG:3857")
#points.to_crs("EPSG:3857")
print(points.crs)


### --- this concludes the GEO dataframe creation using geopandas and shapely

In [None]:
##  ANOTHER WAY TO WRITE FILE
# Use this function to search for any files which match your filename
'''
filename = "calculated_distances.csv"
files_present = glob.glob(filename)
# if no matching files, write to csv, if there are matching files, print statement
if not files_present:
    df.to_csv(filename, index=None)
else:
    df.to_csv(filename+ 1, index=None)
print("WARNING: This file already exists!"
'''