## Introducción

En este notebook se realizará la segmentación de los datos según las necesidades del estudio. Este proceso es fundamental para, posteriormente, lograr modelos de Machine Learning que sean significativos y sean capaces de presentar un tiempo de cómputo factible.

## Para montar el drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/AnalisisDeDatos/PracticaFinal/

/content/drive/.shortcut-targets-by-id/17pg768_y2ACoe-WZZ9w9BrFBRth0htfL/PracticaFinal


### Imports y utilidades

In [None]:
import pandas as pd

In [None]:
import gc
gc.collect()

## Segmentación

### Negocios

#### 1ª Segmentación: EE.UU. vs Canadá

In [None]:
business_filtered = pd.read_csv('data/business/business_filtered.csv', sep=",")

In [None]:
business_filtered

En el dataframe original podemos ver que tenemos dos tipos de zipcode mezclados: los canadienses y los estadounidenses. 

Lo que haremos a continuación es separar estos datos en dos dataframes distintos

In [None]:
business_filtered_eeuu = pd.DataFrame()
business_filtered_canada = pd.DataFrame()

In [None]:
for _, row in business_filtered.iterrows():
  try:
    zipcode = int(row.zipcode)
    business_filtered_eeuu = business_filtered_eeuu.append(row)
  except:
    business_filtered_canada = business_filtered_canada.append(row)

Si el zipcode es numérico significa que pertenece a EEUU, por lo tanto se añade a ese dataframe. De lo contrario, significa que es canadiense, por lo tanto se añade al dataframe de Canada

In [None]:
business_filtered_eeuu

In [None]:
business_filtered_canada

In [None]:
business_filtered_eeuu.to_csv("data/business/segmentation/eeuu/business_filtered_eeuu.csv", index=False)
business_filtered_canada.to_csv("data/business/segmentation/canada/business_filtered_canada.csv", index=False)

#### 2ª Segmentación: Restaurantes

**Caso EEUU**

In [None]:
business_filtered_eeuu = pd.read_csv("data/business/segmentation/eeuu/business_filtered_eeuu.csv", sep=",")

In [None]:
business_filtered_eeuu_categorized = pd.DataFrame()

In [None]:
for i, row in business_filtered_eeuu.iterrows():
  if 'Restaurants' in row.categories.replace(" ", "").split(','):
    business_filtered_eeuu_categorized = business_filtered_eeuu_categorized.append(business_filtered_eeuu.iloc[i])

In [None]:
business_filtered_eeuu_categorized

Unnamed: 0,business_id,categories,city,num_reviews,open,rating,zipcode
1,gnKjwL_1w79qoiV3IC_xQQ,"Sushi Bars, Restaurants, Japanese",Charlotte,170.0,1.0,4.0,28210.0
7,1Dfx3zM-rW4n-31KeC8sJg,"Restaurants, Breakfast & Brunch, Mexican, Taco...",Phoenix,18.0,1.0,3.0,85016.0
9,fweCYi8FmbJXHCqLnwuk8w,"Italian, Restaurants, Pizza, Chicken Wings",Mentor-on-the-Lake,16.0,1.0,4.0,44060.0
12,PZ-LZzSlhSe9utkQYU8pFg,"Restaurants, Italian",Las Vegas,40.0,0.0,4.0,89119.0
17,1RHY4K3BD22FK7Cfftn8Mg,"Sandwiches, Salad, Restaurants, Burgers, Comfo...",Pittsburgh,35.0,1.0,4.0,15231.0
...,...,...,...,...,...,...,...
141150,cfrN6-lQC-dzjBtNBjefpQ,"American (New), Restaurants",Kent,3.0,0.0,2.5,44240.0
141152,JsRt9LPgv-7guVcY4u6OQA,"Pizza, Italian, Restaurants, Seafood",Huntersville,142.0,1.0,4.5,28078.0
141156,7wZgquJ30qkVQbvbJo92ow,"Middle Eastern, Falafel, Mediterranean, Restau...",Madison,6.0,1.0,3.5,53717.0
141163,ghovD5ZTGDQ5Q2U4ERddWw,"Burgers, Restaurants, Fast Food, American (New)",Fairlawn,22.0,1.0,4.0,44333.0


In [None]:
business_filtered_eeuu_categorized.to_csv("data/business/segmentation/eeuu/business_filtered_eeuu_categorized.csv", index=False)

**Caso Canadá**

In [None]:
business_filtered_canada = pd.read_csv("data/business/segmentation/canada/business_filtered_canada.csv", sep=",")

In [None]:
business_filtered_canada_categorized = pd.DataFrame()

In [None]:
for i, row in business_filtered_canada.iterrows():
  if 'Restaurants' in row.categories.replace(" ", "").split(','):
    business_filtered_canada_categorized = business_filtered_canada_categorized.append(business_filtered_canada.iloc[i])

In [None]:
business_filtered_canada_categorized

In [None]:
business_filtered_canada_categorized.to_csv("data/business/segmentation/canada/business_filtered_canada_categorized.csv", index=False)

#### 3ª Segmentación: Concentración de negocios

In [None]:
eeuu_categorized = pd.read_csv("data/business/segmentation/eeuu/business_filtered_eeuu_categorized.csv", sep=",")

In [None]:
eeuu_categorized = pd.merge(eeuu_categorized, , on='business_id')

In [None]:
eeuu_categorized_AZ = eeuu_categorized[eeuu_categorized.zipcode ]

In [None]:
restaurants = pd.read_csv('data/business/segmentation/eeuu/zipcodes/restaurants_zipcodes_CORREGIDOS.csv', sep=",")
restaurants_WI = restaurants[(restaurants.zipcode >= 53001) & (restaurants.zipcode <= 54990)]
restaurants_WI['zone'] = None

In [None]:
#Repetimos con todos los estados:

#zipcodes por zonas
zip_west_WI = [53572,53515,53583,53528,53529,53593,53508,53719,53717,54562,53562]
zip_center_WI = [53711,56705,53597,53726,53715,53597,56706,56713,56716,56703,53575,53716,53703,53713,53701,53706,53705,53792,54704,53521,53725]
zip_east_WI = [53708,53704,53714,56716,53718,53527,53598,53590,53559,53783,53589,53558,53531,53532,53571,53596]

restaurants_WI.loc[restaurants['zipcode'].isin(zip_west_WI),'zone'] = 'Oeste'
restaurants_WI.loc[restaurants['zipcode'].isin(zip_center_WI),'zone'] = 'Centro'
restaurants_WI.loc[restaurants['zipcode'].isin(zip_east_WI),'zone'] = 'Este'

restaurants_WI.to_csv('data/business/segmentation/eeuu/AA/restaurants_WI.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value)
