<a href="https://colab.research.google.com/github/CharlieCMann/INFO-2950-Project/blob/main/datacleaning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Research question(s). State your research question(s) clearly.**

In examining divorce rates from a sample population of 4,900 from Xalapa, Mexico, we intend to explore the relationship between zodiac sign and marriage compatability.

Primarily, our research will analyze the zodiac signs of divorced couples, and will seek to uncover patterns between divorce rates, lengths of marriage, and number of children with respect to zodiac pairing.

Additionally, we will compare the observed compatibility scores from our sample dataset with the hypothesized compatability scores from online astrology experts. 

**Potential Limitations**

There are many potential limitations of the dataset. For one, we only have divorce rates, so we are basing our findings off of failed marriages, not successful ones. Therefore, we are basing compatability off of failure instead of successes. The dataset is also comprised exclusively of heterosexual couples, so we will not be able to generalize these results to same-sex couples. Furthermore, we will be using the subject's birthdates to compute their sun signs. This is the most commonly known and discussed facet of one's chart; however, it is only the tip of the iceberg when applying astrological patterns to an individual.

**Data Description**

*What are the observations (rows) and the attributes (columns)?*

Each observation is a divorce. The attributes of each row include the date of the divorce, the date of birth for the man and the woman, the date of the marriage, the number of children, their occupations, levels of education, and monthly incomes.

*Why was this dataset created? Who funded the creation of the dataset?*

This dataset was created by the Mexican government to keep record of legal divorces for the city of Xalapa, Mexico. The Mexican government funded its creation.

*What processes might have influenced what data was observed and recorded and what was not?*

Since the data tracks legal divorces within Xalapa, Mexico between 2000-2015, it does not include separations between couples that may represent a strained and non-compatible relationship that did not necessarily end in divorce. Additionally, since this data was taken before the legalization of same-sex marriage, it is not representative of the entire population as these pairings were not recorded. 

*What preprocessing was done, and how did the data come to be in the form that you are using?*

The legal dataset did not include marriage duration. The creators of the Kaggle dataset used in this analysis made their own duration variables. These are inconsistently created and will be dropped and recalculated using date of marriage and date of divorce.

*If people are involved, were they aware of the data collection and if so, what purpose did they expect the data to be used for?*

Each observation is the result of a legal divorce filing. They likely did not expect three Cornell students to calculate their zodiac compatitibility, but it was taken with their consent to finalize their divorce.

*Where can your raw source data be found, if applicable? Provide a link to the raw data (hosted in a Cornell Google Drive or Cornell Box).*

https://drive.google.com/drive/folders/1vSdMsdE4WBRfmD1QmDZCYrti4R_y02Z0?usp=sharing

**Questions for Reviewers**

- How should we go about using the marriage duration, number of kids, etc. to generate a measure of observed compatibility?
- How can we make our for loops for efficient? i.e. Can they be made into functions and what recommendations do you have for that?
- How would we interpret the data outcomes since we only having failed marriages and not successful marriages?

In [341]:
import pandas as pd

In [342]:
zodiac_comp = pd.read_csv('Comp_matrix.csv')
zodiac_comp.head()

Unnamed: 0,Zodiac_combination,Compatibility_rate
0,CancerPisces,0.98
1,TaurusGemini,0.33
2,VirgoPisces,0.88
3,PiscesAries,0.67
4,CancerAquarius,0.27


In [343]:
zod_columns = zodiac_comp.columns.str.lower()
#zod_columns

In [344]:
zodiac_comp_rate = zodiac_comp.copy()
zodiac_comp_rate.columns = zod_columns
zodiac_comp_rate.head()

Unnamed: 0,zodiac_combination,compatibility_rate
0,CancerPisces,0.98
1,TaurusGemini,0.33
2,VirgoPisces,0.88
3,PiscesAries,0.67
4,CancerAquarius,0.27


In [345]:
divorce = pd.read_csv('divorces_2000-2015_translated.csv')
divorce.head()

Unnamed: 0,Divorce_date,Type_of_divorce,Nationality_partner_man,DOB_partner_man,Place_of_birth_partner_man,Birth_municipality_of_partner_man,Birth_federal_partner_man,Birth_country_partner_man,Age_partner_man,Residence_municipality_partner_man,Residence_federal_partner_man,Residence_country_partner_man,Monthly_income_partner_man_peso,Occupation_partner_man,Place_of_residence_partner_man,Nationality_partner_woman,DOB_partner_woman,DOB_registration_date_partner_woman,Place_of_birth_partner_woman,Birth_municipality_of_partner_woman,Birth_federal_partner_woman,Birth_country_partner_woman,Age_partner_woman,Place_of_residence_partner_woman,Residence_municipality_partner_woman,Residence_federal_partner_woman,Residence_country_partner_woman,Occupation_partner_woman,Monthly_income_partner_woman_peso,Date_of_marriage,Marriage_certificate_place,Marriage_certificate_municipality,Marriage_certificate_federal,Level_of_education_partner_man,Employment_status_partner_man,Level_of_education_partner_woman,Employment_status_partner_woman,Marriage_duration,Marriage_duration_months,Num_Children,Custody
0,9/6/06,Necesario,MEXICANA,18/12/75,XALAPA - ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,30.0,XALAPA,VERACRUZ,MEXICO,2000.0,PINTOR,XALAPA-ENRIQUEZ,MEXICANA,8/1/83,,PUEBLA,PUEBLA,PUEBLA,MEXICO,22.0,XALAPA-ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,EMPLEADA,1800.0,26/6/00,XALAPA,XALAPA,VERACRUZ,SECUNDARIA,OBRERO,SECUNDARIA,EMPLEADO,5.0,,1.0,
1,1/2/00,Voluntario,MEXICANA,,,,,,47.0,,,,,,,MEXICANA,,,,,,,41.0,,,,,,,17/2/77,XALAPA,XALAPA,VERACRUZ,PREPARATORIA,ESTABLECIMIENTO,PREPARATORIA,EMPLEADO,,,,
2,1/2/05,Necesario,MEXICANA,22/2/55,XALAPA - ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,49.0,,,,,MEDICO,,MEXICANA,21/3/47,,XALAPA-ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,57.0,XALAPA-ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,JUBILADA,,18/12/75,XALAPA,XALAPA,VERACRUZ,PREPARATORIA,OBRERO,,TRABAJADOR POR CUENTA PROPIA EN VIA PUBLICA,,,,
3,1/2/06,Necesario,MEXICANA,20/1/64,XALAPA - ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,42.0,XALAPA,VERACRUZ,MEXICO,6000.0,EMPLEADO,XALAPA-ENRIQUEZ,MEXICANA,,,XALAPA-ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,,XALAPA-ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,COMERCIANTE,5000.0,3/12/87,XALAPA,XALAPA,VERACRUZ,PROFESIONAL,EMPLEADO,PREPARATORIA,EMPLEADO,18.0,,2.0,MADRE
4,1/2/06,Necesario,MEXICANA,30/10/75,XALAPA - ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,30.0,COATEPEC,VERACRUZ,MEXICO,18000.0,MEDICO,COATEPEC,MEXICANA,13/10/78,,XALAPA-ENRIQUEZ,XALAPA,VERACRUZ,MEXICO,27.0,COATEPEC,COATEPEC,VERACRUZ,MEXICO,AMA DE CASA,,14/11/98,XALAPA,XALAPA,VERACRUZ,PROFESIONAL,EMPLEADO,PREPARATORIA,NO TRABAJA,7.0,,2.0,MADRE


In [346]:
new_colnames = divorce.columns.str.lower()
#new_colnames

In [347]:
new_colnames = new_colnames.str.replace(" ", "_")
divorce_data = divorce.copy()
divorce_data.columns = new_colnames
#divorce_data.head()
divorce_data.columns

Index(['divorce_date', 'type_of_divorce', 'nationality_partner_man',
       'dob_partner_man', 'place_of_birth_partner_man',
       'birth_municipality_of_partner_man', 'birth_federal_partner_man',
       'birth_country_partner_man', 'age_partner_man',
       'residence_municipality_partner_man', 'residence_federal_partner_man',
       'residence_country_partner_man', 'monthly_income_partner_man_peso',
       'occupation_partner_man', 'place_of_residence_partner_man',
       'nationality_partner_woman', 'dob_partner_woman',
       'dob_registration_date_partner_woman', 'place_of_birth_partner_woman',
       'birth_municipality_of_partner_woman', 'birth_federal_partner_woman',
       'birth_country_partner_woman', 'age_partner_woman',
       'place_of_residence_partner_woman',
       'residence_municipality_partner_woman',
       'residence_federal_partner_woman', 'residence_country_partner_woman',
       'occupation_partner_woman', 'monthly_income_partner_woman_peso',
       'date_of_m

In [348]:
divorce_data = divorce_data[['divorce_date', 'date_of_marriage', 'type_of_divorce',
       'dob_partner_man', 'age_partner_man', 'monthly_income_partner_man_peso',
       'occupation_partner_man', 'dob_partner_woman', 'age_partner_woman',
       'occupation_partner_woman', 'monthly_income_partner_woman_peso',
       'level_of_education_partner_man', 'employment_status_partner_man',
       'level_of_education_partner_woman', 'employment_status_partner_woman',
       'num_children', 'custody']]
divorce_data = divorce_data.rename(columns={'date_of_marriage': "marriage_date", 'dob_partner_man': "dob_man", 'dob_partner_woman': "dob_woman"})
divorce_data.head()

Unnamed: 0,divorce_date,marriage_date,type_of_divorce,dob_man,age_partner_man,monthly_income_partner_man_peso,occupation_partner_man,dob_woman,age_partner_woman,occupation_partner_woman,monthly_income_partner_woman_peso,level_of_education_partner_man,employment_status_partner_man,level_of_education_partner_woman,employment_status_partner_woman,num_children,custody
0,9/6/06,26/6/00,Necesario,18/12/75,30.0,2000.0,PINTOR,8/1/83,22.0,EMPLEADA,1800.0,SECUNDARIA,OBRERO,SECUNDARIA,EMPLEADO,1.0,
1,1/2/00,17/2/77,Voluntario,,47.0,,,,41.0,,,PREPARATORIA,ESTABLECIMIENTO,PREPARATORIA,EMPLEADO,,
2,1/2/05,18/12/75,Necesario,22/2/55,49.0,,MEDICO,21/3/47,57.0,JUBILADA,,PREPARATORIA,OBRERO,,TRABAJADOR POR CUENTA PROPIA EN VIA PUBLICA,,
3,1/2/06,3/12/87,Necesario,20/1/64,42.0,6000.0,EMPLEADO,,,COMERCIANTE,5000.0,PROFESIONAL,EMPLEADO,PREPARATORIA,EMPLEADO,2.0,MADRE
4,1/2/06,14/11/98,Necesario,30/10/75,30.0,18000.0,MEDICO,13/10/78,27.0,AMA DE CASA,,PROFESIONAL,EMPLEADO,PREPARATORIA,NO TRABAJA,2.0,MADRE


date is in year-month-day

In [349]:
type(divorce_data.divorce_date[0])

str

In [350]:
divorce_data['divorce_date'] = pd.to_datetime(divorce_data['divorce_date'])
divorce_data.head()

Unnamed: 0,divorce_date,marriage_date,type_of_divorce,dob_man,age_partner_man,monthly_income_partner_man_peso,occupation_partner_man,dob_woman,age_partner_woman,occupation_partner_woman,monthly_income_partner_woman_peso,level_of_education_partner_man,employment_status_partner_man,level_of_education_partner_woman,employment_status_partner_woman,num_children,custody
0,2006-09-06,26/6/00,Necesario,18/12/75,30.0,2000.0,PINTOR,8/1/83,22.0,EMPLEADA,1800.0,SECUNDARIA,OBRERO,SECUNDARIA,EMPLEADO,1.0,
1,2000-01-02,17/2/77,Voluntario,,47.0,,,,41.0,,,PREPARATORIA,ESTABLECIMIENTO,PREPARATORIA,EMPLEADO,,
2,2005-01-02,18/12/75,Necesario,22/2/55,49.0,,MEDICO,21/3/47,57.0,JUBILADA,,PREPARATORIA,OBRERO,,TRABAJADOR POR CUENTA PROPIA EN VIA PUBLICA,,
3,2006-01-02,3/12/87,Necesario,20/1/64,42.0,6000.0,EMPLEADO,,,COMERCIANTE,5000.0,PROFESIONAL,EMPLEADO,PREPARATORIA,EMPLEADO,2.0,MADRE
4,2006-01-02,14/11/98,Necesario,30/10/75,30.0,18000.0,MEDICO,13/10/78,27.0,AMA DE CASA,,PROFESIONAL,EMPLEADO,PREPARATORIA,NO TRABAJA,2.0,MADRE


In [351]:
#divorce_data['marriage_date'] = pd.to_datetime(divorce_data['marriage_date'],origin=pd.Timestamp('1916-01-01'),unit='us')
#divorce_data['dob_man'] = pd.to_datetime(divorce_data['dob_man'])
#divorce_data['dob_woman'] = pd.to_datetime(divorce_data['dob_woman'])

In [352]:
divorce_data.head(30)

Unnamed: 0,divorce_date,marriage_date,type_of_divorce,dob_man,age_partner_man,monthly_income_partner_man_peso,occupation_partner_man,dob_woman,age_partner_woman,occupation_partner_woman,monthly_income_partner_woman_peso,level_of_education_partner_man,employment_status_partner_man,level_of_education_partner_woman,employment_status_partner_woman,num_children,custody
0,2006-09-06,26/6/00,Necesario,18/12/75,30.0,2000.0,PINTOR,8/1/83,22.0,EMPLEADA,1800.0,SECUNDARIA,OBRERO,SECUNDARIA,EMPLEADO,1.0,
1,2000-01-02,17/2/77,Voluntario,,47.0,,,,41.0,,,PREPARATORIA,ESTABLECIMIENTO,PREPARATORIA,EMPLEADO,,
2,2005-01-02,18/12/75,Necesario,22/2/55,49.0,,MEDICO,21/3/47,57.0,JUBILADA,,PREPARATORIA,OBRERO,,TRABAJADOR POR CUENTA PROPIA EN VIA PUBLICA,,
3,2006-01-02,3/12/87,Necesario,20/1/64,42.0,6000.0,EMPLEADO,,,COMERCIANTE,5000.0,PROFESIONAL,EMPLEADO,PREPARATORIA,EMPLEADO,2.0,MADRE
4,2006-01-02,14/11/98,Necesario,30/10/75,30.0,18000.0,MEDICO,13/10/78,27.0,AMA DE CASA,,PROFESIONAL,EMPLEADO,PREPARATORIA,NO TRABAJA,2.0,MADRE
5,2006-01-02,20/1/95,Necesario,28/3/73,32.0,,EMPLEADO,14/6/76,29.0,,,SECUNDARIA,EMPLEADO,SECUNDARIA,NO TRABAJA,2.0,MADRE
6,2007-01-02,16/8/91,Necesario,13/12/70,36.0,,EMPLEADO,4/11/71,35.0,LABORES DOMESTICAS,,PROFESIONAL,EMPLEADO,PROFESIONAL,NO TRABAJA,2.0,MADRE
7,2007-01-02,17/9/99,Necesario,17/2/75,31.0,,LICENCIADO,27/8/74,32.0,LICENCIADA,,PROFESIONAL,EMPLEADO,PROFESIONAL,EMPLEADO,1.0,MADRE
8,2008-01-02,3/6/06,Voluntario,2/12/76,31.0,15000.0,COMERCIANTE,3/1/80,28.0,EMPLEADA,,PROFESIONAL,EMPLEADO,PROFESIONAL,NO TRABAJA,,
9,2008-01-02,9/2/01,Voluntario,17/11/76,31.0,6000.0,EMPLEADO,13/3/77,30.0,EMPLEADA,6000.0,PROFESIONAL,EMPLEADO,PROFESIONAL,EMPLEADO,,


dropping observations that have missing dates

In [353]:
divorce_data.dropna(subset=['dob_man','dob_woman','marriage_date','divorce_date'],inplace=True)

properly assigning dates with the proper millenium

In [354]:
new_mar = []
for date in divorce_data['marriage_date']:
    first = date.find('/')
    second = date[first+1:].find('/')
    day = date[:first]
    month = date[first+1:first+second+1]
    year = date[first+second+2:]
    if int(year) > 15:
        year = '19'+year
    else:
        year = '20'+year
    final = (day+'/'+month+'/'+year)
    new_mar.append(final)

divorce_data['marriage_date'] = new_mar

divorce_data['dob_man'] = divorce_data['dob_man'].astype(str)

new_dob_man = []
for date in divorce_data['dob_man']:
    first = date.find('/')
    second = date[first+1:].find('/')
    day = date[:first]
    month = date[first+1:first+second+1]
    year = date[first+second+2:]
    #print(type(year))
    #print(year)
    if int(year) > 15:
        year = '19'+year
    else:
        year = '20'+year
    final = (day+'/'+month+'/'+year)
    new_dob_man.append(final)

divorce_data['dob_man'] = new_dob_man

new_dob_woman = []
for date in divorce_data['dob_woman']:
    first = date.find('/')
    second = date[first+1:].find('/')
    day = date[:first]
    month = date[first+1:first+second+1]
    year = date[first+second+2:]
    if int(year) > 15:
        year = '19'+year
    else:
        year = '20'+year
    final = (day+'/'+month+'/'+year)
    new_dob_woman.append(final)

divorce_data['dob_woman'] = new_dob_woman

divorce_data['marriage_date'] = pd.to_datetime(divorce_data['marriage_date'])
divorce_data['dob_man'] = pd.to_datetime(divorce_data['dob_man'])
divorce_data['dob_woman'] = pd.to_datetime(divorce_data['dob_woman'])

divorce_data.head()

Unnamed: 0,divorce_date,marriage_date,type_of_divorce,dob_man,age_partner_man,monthly_income_partner_man_peso,occupation_partner_man,dob_woman,age_partner_woman,occupation_partner_woman,monthly_income_partner_woman_peso,level_of_education_partner_man,employment_status_partner_man,level_of_education_partner_woman,employment_status_partner_woman,num_children,custody
0,2006-09-06,2000-06-26,Necesario,1975-12-18,30.0,2000.0,PINTOR,1983-08-01,22.0,EMPLEADA,1800.0,SECUNDARIA,OBRERO,SECUNDARIA,EMPLEADO,1.0,
2,2005-01-02,1975-12-18,Necesario,1955-02-22,49.0,,MEDICO,1947-03-21,57.0,JUBILADA,,PREPARATORIA,OBRERO,,TRABAJADOR POR CUENTA PROPIA EN VIA PUBLICA,,
4,2006-01-02,1998-11-14,Necesario,1975-10-30,30.0,18000.0,MEDICO,1978-10-13,27.0,AMA DE CASA,,PROFESIONAL,EMPLEADO,PREPARATORIA,NO TRABAJA,2.0,MADRE
5,2006-01-02,1995-01-20,Necesario,1973-03-28,32.0,,EMPLEADO,1976-06-14,29.0,,,SECUNDARIA,EMPLEADO,SECUNDARIA,NO TRABAJA,2.0,MADRE
6,2007-01-02,1991-08-16,Necesario,1970-12-13,36.0,,EMPLEADO,1971-04-11,35.0,LABORES DOMESTICAS,,PROFESIONAL,EMPLEADO,PROFESIONAL,NO TRABAJA,2.0,MADRE


In [355]:
divorce_data = divorce_data.sort_values(by="divorce_date")

divorce_data.head()

Unnamed: 0,divorce_date,marriage_date,type_of_divorce,dob_man,age_partner_man,monthly_income_partner_man_peso,occupation_partner_man,dob_woman,age_partner_woman,occupation_partner_woman,monthly_income_partner_woman_peso,level_of_education_partner_man,employment_status_partner_man,level_of_education_partner_woman,employment_status_partner_woman,num_children,custody
42,2000-01-06,1989-08-19,Voluntario,1967-07-15,32.0,,CHOFER,1964-08-23,35.0,EMPLEADA,1500.0,SECUNDARIA,EMPLEADO,OTRO,EMPLEADO,1.0,MADRE
43,2000-01-06,1997-07-26,Necesario,1967-06-18,32.0,,ING. AGRONOMO,1966-11-18,33.0,QUIMICA,,PROFESIONAL,EMPLEADO,PROFESIONAL,NO TRABAJA,,
80,2000-01-08,1995-05-27,Voluntario,1963-08-04,37.0,3000.0,PUBLICISTA,1972-08-25,27.0,SECRETARIA,1700.0,PROFESIONAL,ESTABLECIMIENTO,PREPARATORIA,EMPLEADO,1.0,MADRE
79,2000-01-08,1992-04-09,Voluntario,1965-05-01,35.0,2200.0,FERROCARRILERO,1962-09-30,37.0,PROFESORA,2800.0,SECUNDARIA,EMPLEADO,PROFESIONAL,EMPLEADO,2.0,PADRE
96,2000-01-09,1994-12-19,Voluntario,1972-11-17,27.0,25000.0,ING.,1976-10-29,23.0,PROFESORA,3400.0,PROFESIONAL,EMPLEADO,PROFESIONAL,EMPLEADO,1.0,


recreating the duration variables in months
https://stackoverflow.com/questions/42822768/pandas-number-of-months-between-two-dates

In [356]:
from operator import attrgetter
divorce_data['marriage_months'] = (divorce_data['divorce_date'].dt.to_period('M') - divorce_data['marriage_date'].dt.to_period('M')).apply(attrgetter('n'))
divorce_data['marriage_months'].head()

42    125
43     30
80     56
79     93
96     61
Name: marriage_months, dtype: int64

Coding the type of divorce to a numeric. Voluntario is 1, Necesario is 2. A voluntary divorce means that both parties agree to get a divorce. A necessary divorce is when one of the parties does not agree.

In [357]:
divorce_data['type_of_divorce'] = [obs.replace("Voluntario","1") for obs in divorce_data['type_of_divorce']]
divorce_data['type_of_divorce'] = [obs.replace("Necesario","2") for obs in divorce_data['type_of_divorce']]
divorce_data['type_of_divorce'] = divorce_data['type_of_divorce'].astype(int)
print(type(divorce_data['type_of_divorce'][0]))

<class 'numpy.int64'>


Replacing NaN children to 0.

In [358]:
divorce_data['num_children'] = divorce_data['num_children'].fillna(0)
divorce_data.head()

Unnamed: 0,divorce_date,marriage_date,type_of_divorce,dob_man,age_partner_man,monthly_income_partner_man_peso,occupation_partner_man,dob_woman,age_partner_woman,occupation_partner_woman,monthly_income_partner_woman_peso,level_of_education_partner_man,employment_status_partner_man,level_of_education_partner_woman,employment_status_partner_woman,num_children,custody,marriage_months
42,2000-01-06,1989-08-19,1,1967-07-15,32.0,,CHOFER,1964-08-23,35.0,EMPLEADA,1500.0,SECUNDARIA,EMPLEADO,OTRO,EMPLEADO,1.0,MADRE,125
43,2000-01-06,1997-07-26,2,1967-06-18,32.0,,ING. AGRONOMO,1966-11-18,33.0,QUIMICA,,PROFESIONAL,EMPLEADO,PROFESIONAL,NO TRABAJA,0.0,,30
80,2000-01-08,1995-05-27,1,1963-08-04,37.0,3000.0,PUBLICISTA,1972-08-25,27.0,SECRETARIA,1700.0,PROFESIONAL,ESTABLECIMIENTO,PREPARATORIA,EMPLEADO,1.0,MADRE,56
79,2000-01-08,1992-04-09,1,1965-05-01,35.0,2200.0,FERROCARRILERO,1962-09-30,37.0,PROFESORA,2800.0,SECUNDARIA,EMPLEADO,PROFESIONAL,EMPLEADO,2.0,PADRE,93
96,2000-01-09,1994-12-19,1,1972-11-17,27.0,25000.0,ING.,1976-10-29,23.0,PROFESORA,3400.0,PROFESIONAL,EMPLEADO,PROFESIONAL,EMPLEADO,1.0,,61


assigning zodiac to both partners

In [359]:
#divorce_data['zodiac_man'] = []
#divorce_data['zodiac_woman'] = []

In [360]:
def zodiac(date):
  """
  Returns the zodiac sign for the data given.
  Aries	Mar 21 - Apr 19
  Taurus	Apr 20 - May 20
  Gemini	May 21 - Jun 20
  Cancer	Jun 21 - July 22
  Leo	July 23 - Aug 22
  Virgo	Aug 23 - Sep 22
  Libra	Sep 23 - Oct 22
  Scorpio	Oct 23 - Nov 21
  Sagittarius	Nov 22 - Dec 21
  Capricorn	Dec 22 - Jan 19
  Aquarius	Jan 20 - Feb 18
  Pisces	Feb 19 - Mar 20
  """
  if date.month == 1:
    if date.day <20:
      return "Capricorn",1
    else:
      return "Aquarius",2
  elif date.month == 2:
    if date.day <19:
      return "Aquarius",2
    else:
      return "Pisces",3
  elif date.month == 3:
    if date.day <21:
      return "Pisces",3
    else:
      return "Aries",4
  elif date.month == 4:
    if date.day < 20:
      return "Aries",4
    else:
      return "Taurus",5
  elif date.month == 5:
    if date.day <21:
      return "Taurus",5
    else:
      return "Gemini",6
  elif date.month == 6:
    if date.day <21:
      return "Gemini",6
    else:
      return "Cancer",7
  elif date.month == 7:
    if date.day <23:
      return "Cancer",7
    else:
      return "Leo",8
  elif date.month == 8:
    if date.day <23:
      return "Leo",8
    else:
      return "Virgo",9
  elif date.month == 9:
    if date.day <23:
      return "Virgo",9
    else:
      return "Libra",10
  elif date.month == 10:
    if date.day <23:
      return "Libra",10
    else:
      return "Scorpio",11
  elif date.month == 11:
    if date.day <22:
      return "Scorpio",11
    else:
      return "Sagittarius",12
  elif date.month == 12:
    if date.day <22:
      return "Sagittarius",12
    else:
      return "Capricorn",1

In [361]:
zodiac_man = []
zodiac_num_man = []
for date in divorce_data['dob_man']:
  sign = zodiac(date)
  zodiac_man.append(sign[0])
  zodiac_num_man.append(sign[1])

divorce_data['zodiac_man'] = zodiac_man
divorce_data['zodiac_num_man'] = zodiac_num_man

zodiac_woman = []
zodiac_num_woman = []
for date in divorce_data['dob_woman']:
  sign = zodiac(date)
  zodiac_woman.append(sign[0])
  zodiac_num_woman.append(sign[1])

divorce_data['zodiac_woman'] = zodiac_woman
divorce_data['zodiac_num_woman'] = zodiac_num_woman

divorce_data.head()

Unnamed: 0,divorce_date,marriage_date,type_of_divorce,dob_man,age_partner_man,monthly_income_partner_man_peso,occupation_partner_man,dob_woman,age_partner_woman,occupation_partner_woman,monthly_income_partner_woman_peso,level_of_education_partner_man,employment_status_partner_man,level_of_education_partner_woman,employment_status_partner_woman,num_children,custody,marriage_months,zodiac_man,zodiac_num_man,zodiac_woman,zodiac_num_woman
42,2000-01-06,1989-08-19,1,1967-07-15,32.0,,CHOFER,1964-08-23,35.0,EMPLEADA,1500.0,SECUNDARIA,EMPLEADO,OTRO,EMPLEADO,1.0,MADRE,125,Cancer,7,Virgo,9
43,2000-01-06,1997-07-26,2,1967-06-18,32.0,,ING. AGRONOMO,1966-11-18,33.0,QUIMICA,,PROFESIONAL,EMPLEADO,PROFESIONAL,NO TRABAJA,0.0,,30,Gemini,6,Scorpio,11
80,2000-01-08,1995-05-27,1,1963-08-04,37.0,3000.0,PUBLICISTA,1972-08-25,27.0,SECRETARIA,1700.0,PROFESIONAL,ESTABLECIMIENTO,PREPARATORIA,EMPLEADO,1.0,MADRE,56,Leo,8,Virgo,9
79,2000-01-08,1992-04-09,1,1965-05-01,35.0,2200.0,FERROCARRILERO,1962-09-30,37.0,PROFESORA,2800.0,SECUNDARIA,EMPLEADO,PROFESIONAL,EMPLEADO,2.0,PADRE,93,Taurus,5,Libra,10
96,2000-01-09,1994-12-19,1,1972-11-17,27.0,25000.0,ING.,1976-10-29,23.0,PROFESORA,3400.0,PROFESIONAL,EMPLEADO,PROFESIONAL,EMPLEADO,1.0,,61,Scorpio,11,Scorpio,11


Below for data analysis: 
- make histogram of zodiacs with a bar for each zodiac sign on x axis and count of them on the y 
- make a histogram with avg marriage length for each star sign 
- find mean and std marriage duration (years not months) and make histogram of mean with count on y
- make scatterplot with male zod on x and female zod on y so each row entry is a point in the chart, then
- for all charts that u make remember to properly label and do the plt.show() at the end (title, axes) and do a  text cell underneath saying what you can draw from the data
-add text cell under code that gets rid of nan birthdates and justify that we got rid of nans because without the birthday we cannot find the zodiac and therefore cannot include them in our final evaluations
- eventually, we will be importing other data sets of compatibility scores for zodiac pairings from the internet, but we have not learned how to do that yet, so that part of our research question has not been addressed for now (add text cell along the lines of this under data analysis charts)
- reading the rubric: very important that we make note and justify why we are including these specific charts and why we are analyzing this specific data
- grading for this part based more on justification and analysis then the actual charts