# isocronos desde las universidades de california
#### Una app de voila para visualizar isocronos de las universidades publicas de california.

<details>
    <summary><strong>Meta</strong></summary>
    La meta de este notebook es abrir el archivo zip y preparar los datos para la aplicación.
    <ul>
        <li> Metas mensurables </li>
        <li> Limpiar datos y valores desde el archivo zip y crear un archivo csv listo para usar</li>
    </ul>
</details>

<details>
    <summary><strong>Contexto</strong></summary>
    Descargamos los datos desde IPEDS con la información de las universidades y colegios públicas de 2 o 4 años de California.
</details>

In [7]:
import pandas as pd
from zipfile import ZipFile
from pathlib import Path
from herramientas import arbol
import arrow
hoy = arrow.now().format("DD-MMM-YY", locale = 'es')

hoy

'05-jul-19'

In [8]:
DATOS_BRUTOS = Path("../datos/brutos/")
DATOS_INTERINOS = Path("../datos/interinos/")
DATOS_PROCESADOS = Path("../datos/procesados/")
DATOS_EXTERNOS = Path("../datos/externos/")

In [9]:
arbol(DATOS_BRUTOS)

+ ..\datos\brutos
    + ipeds-julio-5.zip


In [11]:
ZipFile(DATOS_BRUTOS / 'ipeds-julio-5.zip').extractall(DATOS_BRUTOS / 'datos_ipeds/')

In [12]:
arbol(DATOS_BRUTOS)

+ ..\datos\brutos
    + datos_ipeds
        + Data_7-5-2019---729.csv
        + ValueLabels_7-5-2019---729.csv
    + ipeds-julio-5.zip


In [40]:
datos = pd.read_csv(DATOS_BRUTOS / 'datos_ipeds' / 'Data_7-5-2019---729.csv')
valores = pd.read_csv(DATOS_BRUTOS / 'datos_ipeds' / 'ValueLabels_7-5-2019---729.csv')

In [41]:
datos.head()

Unnamed: 0,UnitID,Institution Name,Sector of institution (HD2017),Institution name alias (HD2017),Institution (entity) name (HD2017),Street address or post office box (HD2017),City location of institution (HD2017),ZIP code (HD2017),Institution's internet website address (HD2017),Financial aid office web address (HD2017),Admissions office web address (HD2017),Online application web address (HD2017),Unnamed: 12
0,108807,Allan Hancock College,4,,Allan Hancock College,800 South College Drive,Santa Maria,93454-6399,www.hancockcollege.edu/,www.hancockcollege.edu/financial_aid/index.php,www.hancockcollege.edu/admissions_records/,www.hancockcollege.edu/future_students/,
1,109208,American River College,4,American River | ARC,American River College,4700 College Oak Dr,Sacramento,95841-4286,www.arc.losrios.edu/,www.arc.losrios.edu/Support_Services/Financial...,www.arc.losrios.edu/prospective_students.htm,www.losrios.edu/lrc/lrc_app.php,
2,109350,Antelope Valley College,1,,Antelope Valley College,3041 West Ave K,Lancaster,93536-5426,www.avc.edu,www.avc.edu/studentservices/finaid/,www.avc.edu/studentservices/adminrec/,www.avc.edu/studentservices/adminrec/applyonline,
3,109819,Bakersfield College,1,,Bakersfield College,1801 Panorama Dr,Bakersfield,93305-1299,www.bakersfieldcollege.edu/,www.bakersfieldcollege.edu/finaid/,www.bakersfieldcollege.edu/admissions/,https://www.bakersfieldcollege.edu/apply,
4,109907,Barstow Community College,4,,Barstow Community College,2700 Barstow Road,Barstow,92311,www.barstow.edu,www.barstow.edu/Financial-Aid.html,www.barstow.edu/Admission-and-Records.html,https://www.opencccapply.net/uPortal/render.us...,


In [42]:
valores.head()

Unnamed: 0,VariableName,Value,ValueLabel
0,Sector of institution (HD2017),1,"Public, 4-year or above"
1,Sector of institution (HD2017),4,"Public, 2-year"


# Sistemas educativos
En California existen 3 sistemas universitarios: _California State University_ (CSU), _University of California_ (UC) y _California Community Colleges_ (CCC).
CCC son colegios comunitarios que normalmente solo otorgan títulos _Associate's_ (de 2 años). Desde hace unos años atrás, los CCCs pueden otorgar licenciaturas también (_Bachelor's_) lo cual significa que algunos de estos CCCs estan clasificados como _Public, 4-year or above_.

Crearemos nuestras etiquetas primero para CSU (23 universidades), luego UC (11 universidades) y el resto serán los CCCs.

In [72]:
valores_sector = {}

for indice, valor in valores[['Value', 'ValueLabel']].iterrows():
    valores_sector[valor[0]] = valor[1]
    
valores_sector

{1: 'Public, 4-year or above', 4: 'Public, 2-year'}

In [44]:
datos['Sector of institution (HD2017)'] = datos['Sector of institution (HD2017)'].map(valores_sector)

datos.head()

Unnamed: 0,UnitID,Institution Name,Sector of institution (HD2017),Institution name alias (HD2017),Institution (entity) name (HD2017),Street address or post office box (HD2017),City location of institution (HD2017),ZIP code (HD2017),Institution's internet website address (HD2017),Financial aid office web address (HD2017),Admissions office web address (HD2017),Online application web address (HD2017),Unnamed: 12
0,108807,Allan Hancock College,"Public, 2-year",,Allan Hancock College,800 South College Drive,Santa Maria,93454-6399,www.hancockcollege.edu/,www.hancockcollege.edu/financial_aid/index.php,www.hancockcollege.edu/admissions_records/,www.hancockcollege.edu/future_students/,
1,109208,American River College,"Public, 2-year",American River | ARC,American River College,4700 College Oak Dr,Sacramento,95841-4286,www.arc.losrios.edu/,www.arc.losrios.edu/Support_Services/Financial...,www.arc.losrios.edu/prospective_students.htm,www.losrios.edu/lrc/lrc_app.php,
2,109350,Antelope Valley College,"Public, 4-year or above",,Antelope Valley College,3041 West Ave K,Lancaster,93536-5426,www.avc.edu,www.avc.edu/studentservices/finaid/,www.avc.edu/studentservices/adminrec/,www.avc.edu/studentservices/adminrec/applyonline,
3,109819,Bakersfield College,"Public, 4-year or above",,Bakersfield College,1801 Panorama Dr,Bakersfield,93305-1299,www.bakersfieldcollege.edu/,www.bakersfieldcollege.edu/finaid/,www.bakersfieldcollege.edu/admissions/,https://www.bakersfieldcollege.edu/apply,
4,109907,Barstow Community College,"Public, 2-year",,Barstow Community College,2700 Barstow Road,Barstow,92311,www.barstow.edu,www.barstow.edu/Financial-Aid.html,www.barstow.edu/Admission-and-Records.html,https://www.opencccapply.net/uPortal/render.us...,


In [45]:
mascara_4_años = datos['Sector of institution (HD2017)'] == "Public, 4-year or above"
mascara_2_años = datos['Sector of institution (HD2017)'] == 'Public, 2-year'

mascara_CSU = datos['Institution Name'].str.contains("State")
mascara_UC = datos['Institution Name'].str.contains("University of California")
mascara_CCC = datos['Institution Name'].str.contains("College")

#### Revisemos que los números cuadren

In [46]:
datos[mascara_4_años & mascara_CSU].shape

(23, 13)

In [47]:
datos[mascara_4_años & mascara_UC].shape

(11, 13)

In [48]:
datos[mascara_2_años & mascara_CCC].shape

(100, 13)

Los CCC deberían ser 114 pero el resto de los números cuadran. Prosigamos con los que tenemos y arreglaremos el resto más tarde.

In [49]:
datos.loc[mascara_4_años & mascara_CSU, 'Sistema'] = 'CSU'
datos.loc[mascara_4_años & mascara_UC, 'Sistema'] = 'UC'
datos.loc[mascara_2_años & mascara_CCC, 'Sistema'] = 'CCC'

Veamos el resto

In [50]:
datos[datos['Sistema'].isna()]

Unnamed: 0,UnitID,Institution Name,Sector of institution (HD2017),Institution name alias (HD2017),Institution (entity) name (HD2017),Street address or post office box (HD2017),City location of institution (HD2017),ZIP code (HD2017),Institution's internet website address (HD2017),Financial aid office web address (HD2017),Admissions office web address (HD2017),Online application web address (HD2017),Unnamed: 12,Sistema
2,109350,Antelope Valley College,"Public, 4-year or above",,Antelope Valley College,3041 West Ave K,Lancaster,93536-5426,www.avc.edu,www.avc.edu/studentservices/finaid/,www.avc.edu/studentservices/adminrec/,www.avc.edu/studentservices/adminrec/applyonline,,
3,109819,Bakersfield College,"Public, 4-year or above",,Bakersfield College,1801 Panorama Dr,Bakersfield,93305-1299,www.bakersfieldcollege.edu/,www.bakersfieldcollege.edu/finaid/,www.bakersfieldcollege.edu/admissions/,https://www.bakersfieldcollege.edu/apply,,
50,113236,Cypress College,"Public, 4-year or above",,Cypress College,9200 Valley View,Cypress,90630-5897,www.cypresscollege.edu,www.cypresscollege.edu/admissions/financialAid,www.cypresscollege.edu/admissions,www.cypresscollege.edu/admissions/gettingStarted,,
54,413802,East San Gabriel Valley Regional Occupational ...,"Public, 2-year",,East San Gabriel Valley Regional Occupational ...,1501 W Del Norte St.,West Covina,91790,www.esgvrop.org,www.esgvrop.org,www.esgvrop.org,www.esgvrop.org,,
58,114433,Feather River Community College District,"Public, 4-year or above",Feather River College,Feather River Community College District,570 Golden Eagle Ave,Quincy,95971-9124,www.frc.edu,www.frc.edu/financialaid/,www.frc.edu/admissions/index.cfm,https://banner.frc.edu/pls/PROD/bwskalog.P_Dis...,,
60,114716,Foothill College,"Public, 4-year or above",,Foothill College,12345 El Monte Rd,Los Altos Hills,94022,www.foothill.edu,https://foothill.edu/financialaid/,www.foothill.edu/admissions.php,www.foothill.edu/apply/,,
67,383084,Hacienda La Puente Adult Education,"Public, 2-year",,Hacienda La Puente Adult Education,14101 E. Nelson Ave,La Puente,91746-0002,www.hlpae.com,,,,,
89,118912,MiraCosta College,"Public, 4-year or above",,MiraCosta College,One Barnard Drive,Oceanside,92056-3899,www.miracosta.edu,www.miracosta.edu/studentservices/financialaid...,www.miracosta.edu/studentservices/admissions/i...,www.miracosta.edu/studentservices/applyenroll/...,,
91,118976,Modesto Junior College,"Public, 4-year or above",MJC,Modesto Junior College,435 College Ave,Modesto,95350-5800,www.mjc.edu,www.mjc.edu/studentservices/finaid/,www.mjc.edu/studentservices/enrollment/,www.mjc.edu/studentservices/enrollment/admissi...,,
107,121886,Rio Hondo College,"Public, 4-year or above",RHCCD|RHC,Rio Hondo College,3600 Workman Mill Rd,Whittier,90601-1616,www.riohondo.edu,www.riohondo.edu/financial-aid/,www.riohondo.edu/admissions/,www.riohondo.edu/get-started/,,


Parece ser que existen dos instituciones de educación para adultos y el resto de los colegios CCC estan clasificados como _Public, 4-year or above_. Así que utilicemos esa mascara y borremos las 2 instituciones que no son universidades o colegios.

In [51]:
datos.loc[mascara_4_años & mascara_CCC, 'Sistema'] = 'CCC'

In [53]:
datos = datos[~(datos['Sistema'].isna())]

In [54]:
datos.shape

(149, 14)

Guardemos solo nuestras columnas de interés

In [55]:
datos.columns

Index(['UnitID', 'Institution Name', 'Sector of institution (HD2017)',
       'Institution name alias (HD2017)', 'Institution (entity) name (HD2017)',
       'Street address or post office box (HD2017)',
       'City location of institution (HD2017)', 'ZIP code (HD2017)',
       'Institution's internet website address (HD2017)',
       'Financial aid office web address (HD2017)',
       'Admissions office web address (HD2017)',
       'Online application web address (HD2017)', 'Unnamed: 12', 'Sistema'],
      dtype='object')

In [58]:
columnas_de_interes = [
    'Sistema',
    'Institution Name', 
    'Institution name alias (HD2017)', 
    'Institution (entity) name (HD2017)',
    'Street address or post office box (HD2017)',
    'City location of institution (HD2017)',
    'ZIP code (HD2017)',
    "Institution's internet website address (HD2017)",
    'Financial aid office web address (HD2017)',
    'Admissions office web address (HD2017)',
    'Online application web address (HD2017)',
]

In [59]:
datos_limpios = datos[columnas_de_interes].copy()

In [60]:
datos_limpios.shape

(149, 11)

In [62]:
datos_limpios.head()

Unnamed: 0,Sistema,Institution Name,Institution name alias (HD2017),Institution (entity) name (HD2017),Street address or post office box (HD2017),City location of institution (HD2017),ZIP code (HD2017),Institution's internet website address (HD2017),Financial aid office web address (HD2017),Admissions office web address (HD2017),Online application web address (HD2017)
0,CCC,Allan Hancock College,,Allan Hancock College,800 South College Drive,Santa Maria,93454-6399,www.hancockcollege.edu/,www.hancockcollege.edu/financial_aid/index.php,www.hancockcollege.edu/admissions_records/,www.hancockcollege.edu/future_students/
1,CCC,American River College,American River | ARC,American River College,4700 College Oak Dr,Sacramento,95841-4286,www.arc.losrios.edu/,www.arc.losrios.edu/Support_Services/Financial...,www.arc.losrios.edu/prospective_students.htm,www.losrios.edu/lrc/lrc_app.php
2,CCC,Antelope Valley College,,Antelope Valley College,3041 West Ave K,Lancaster,93536-5426,www.avc.edu,www.avc.edu/studentservices/finaid/,www.avc.edu/studentservices/adminrec/,www.avc.edu/studentservices/adminrec/applyonline
3,CCC,Bakersfield College,,Bakersfield College,1801 Panorama Dr,Bakersfield,93305-1299,www.bakersfieldcollege.edu/,www.bakersfieldcollege.edu/finaid/,www.bakersfieldcollege.edu/admissions/,https://www.bakersfieldcollege.edu/apply
4,CCC,Barstow Community College,,Barstow Community College,2700 Barstow Road,Barstow,92311,www.barstow.edu,www.barstow.edu/Financial-Aid.html,www.barstow.edu/Admission-and-Records.html,https://www.opencccapply.net/uPortal/render.us...


In [63]:
nombres_columnas_español = [
    'Sistema',
    'Nombre', 
    'Alias', 
    'Nombre (alternativo)',
    'Dirección',
    'Ciudad',
    'Código postal',
    "Sitio de internet",
    'Sitio de internet (ayuda financiera)',
    'Sitio de internet (oficina de adminsiones)',
    'Sitio de internet (aplicación en línea)',
]

In [64]:
datos_limpios.columns = nombres_columnas_español

In [73]:
# Dirección para isocronos
datos_limpios['Dirección OSMNx'] = datos_limpios['Dirección'] + ", " + datos_limpios['Ciudad'] + ", CA " + datos_limpios['Código postal']

In [74]:
datos_limpios.to_csv(DATOS_PROCESADOS / f'base_de_datos-{hoy}.csv', encoding = 'utf-8', index = False,)