# k-anonymity
In this you will practice exploring and linking various Fake datasets and try to de-identify and re-identify owners of records. Think about an attackers who wants to gain as much information as possible. The attacker may want to ask for money based on the value of the information found about each person. 

## Datasets
There are four datasets:
1. income.csv: It is the dataset that an imaginary tax-related organization has about its clients.
2. ip.csv: This is a simple example of an internet provider company (e.g. Shaw)
3. hospital.csv: The dataset by an insurance company that provides insurance for travellers.
4. creditcard.csv: A third party organization for credit checks. 

### Load the datasets
Load each dataset as a separate dataframe and explore the data.

In [243]:
import pandas as pd

In [244]:
income = pd.read_csv ('./income.csv')

In [245]:
ip = pd.read_csv ('./ip.csv')

In [246]:
hospital = pd.read_csv ('./hospital.csv')

In [247]:
creditcard = pd.read_csv ('./creditcard.csv')

In [248]:
income

Unnamed: 0,name,lastname,ID,DOB,postal_code,color,companies,income
0,Monica Sanders,Ballard,raymondmoore,1990-10-13,92310,DarkBlue,Joseph-Burns,120000
1,Lorraine Hale,Simpson,sporter,2000-03-21,73196,SeaShell,Nguyen PLC,70000
2,Christopher Lee,Willis,biancasnyder,1992-03-19,86372,Bisque,Byrd-Walton,223546
3,Mikayla Henderson,Sanchez,ujackson,1945-04-02,19557,LightYellow,Pena Group,62345
4,Andre Smith,Mays,ewolfe,1983-11-25,94306,DarkSlateBlue,Schneider Inc,146098
5,Thomas Woods,Zimmerman,lindseyjames,1951-02-14,29648,LightBlue,Ferguson Group,56000
6,Andrea Freeman,Lopez,johnhunt,1949-02-24,10124,Fuchsia,"Martin, Alvarez and Young",231456
7,David Villanueva,Jones,kpetersen,1947-01-31,78788,OrangeRed,"Burns, Michael and Collins",210900
8,Blake Shaffer,Moore,xwillis,1958-10-26,77075,MediumAquaMarine,"Miller, Hanson and Roberts",93567
9,Tabitha Flowers,Scott,bennettjustin,1983-12-17,82698,IndianRed,Freeman-Perry,90000


In [249]:
ip

Unnamed: 0,name,lastname,DOB,ip_address,location
0,Monica Sanders,Ballard,1990-10-13,192.0.8.93,"('53.7446', '-0.33525', 'Kingston upon Hull', ..."
1,Lorraine Hale,Simpson,2000-03-21,203.48.10.235,"('48.73218', '11.18709', 'Neuburg an der Donau..."
2,Christopher Lee,Willis,1992-03-19,198.51.98.53,"('35.06544', '1.04945', 'Frenda', 'DZ', 'Afric..."
3,Mikayla Henderson,Sanchez,1945-04-02,192.160.182.167,"('35.85', '117.7', 'Dongdu', 'CN', 'Asia/Shang..."
4,Andre Smith,Mays,1983-11-25,213.43.91.75,"('32.05971', '34.8732', 'Ganei Tikva', 'IL', '..."
5,Thomas Woods,Zimmerman,1951-02-14,192.29.160.209,"('-20.87306', '-48.29694', 'Viradouro', 'BR', ..."
6,Andrea Freeman,Lopez,1949-02-24,198.51.2.188,"('22.37066', '114.10479', 'Tsuen Wan', 'HK', '..."
7,David Villanueva,Jones,1947-01-31,198.58.178.92,"('48.52961', '12.16179', 'Landshut', 'DE', 'Eu..."
8,Blake Shaffer,Moore,1958-10-26,203.3.238.205,"('48.07667', '8.64409', 'Trossingen', 'DE', 'E..."
9,Tabitha Flowers,Scott,1983-12-17,192.52.207.100,"('38.37255', '34.02537', 'Aksaray', 'TR', 'Eur..."


In [250]:
hospital

Unnamed: 0,name,lastname,DOB,last_food,medical reason
0,Monica Sanders,Ballard,1990-10-13,banana,back pain
1,Lorraine Hale,Simpson,2000-03-21,apple,flue
2,Christopher Lee,Willis,1992-03-19,steak,vomiting
3,Mikayla Henderson,Sanchez,1945-04-02,coffee,fever
4,Andre Smith,Mays,1983-11-25,mocha,cancer
5,Thomas Woods,Zimmerman,1951-02-14,strawberry,cold
6,Andrea Freeman,Lopez,1949-02-24,apple,knee problem
7,David Villanueva,Jones,1947-01-31,gala,accident
8,Blake Shaffer,Moore,1958-10-26,chicken,flue
9,Tabitha Flowers,Scott,1983-12-17,chickenpie,injury


In [251]:
creditcard

Unnamed: 0,name,lastname,DOB,postal_code,credit_number,credit_provider,credit_security_code
0,Monica Sanders,Ballard,1990-10-13,92310,4760000000000000.0,VISA 13 digit,8
1,Lorraine Hale,Simpson,2000-03-21,73196,2220000000000000.0,Discover,644
2,Christopher Lee,Willis,1992-03-19,86372,373000000000000.0,JCB 16 digit,542
3,Mikayla Henderson,Sanchez,1945-04-02,19557,4.42e+18,Maestro,454
4,Andre Smith,Mays,1983-11-25,94306,4.85e+18,JCB 16 digit,297
5,Thomas Woods,Zimmerman,1951-02-14,29648,4.66e+18,American Express,188
6,Andrea Freeman,Lopez,1949-02-24,10124,4450000000000000.0,VISA 16 digit,565
7,David Villanueva,Jones,1947-01-31,78788,36300000000000.0,American Express,76
8,Blake Shaffer,Moore,1958-10-26,77075,4010000000000.0,JCB 16 digit,445
9,Tabitha Flowers,Scott,1983-12-17,82698,3510000000000000.0,VISA 19 digit,368


### De-identification
For each dataset, justify your answers for the columns as each being: 
1. explicit identifier
2. quasi identifiers
3. sensitive data
4. other

**income**

| explicit identifier | quasi identifiers | sensitive data | other |
| ------ | ------ | ------ | ------ |
| name | DOB | income | color |
| lastname | postal_code | companies |
| ID |  |


**Explaination:** For the income dataset, name, last name and id are unique for a specific person, and we can identify the person by one of the three identifiers, so they are explicit identifiers; DOB and postal_code are not unique for a specific person, but enough of these kind of identifiers can also identify a specific person, so they are quasi-identifiers; income and companies are private for the data object, but useful for the researchers, so they are sensitive data; color is a both non-identifier and non-sensitive attribute, so it is classified into other.

**ip**

| explicit identifier | quasi identifiers | sensitive data | other |
| ------ | ------ | ------ | ------ |
| name | DOB | ip_address |
| lastname | location |  |


**Explaination:** For the ip dataset, name, last name are unique for a specific person, and we can identify the person by one of the two identifiers, so they are explicit identifiers; DOB and location are not unique for a specific person, but enough of these kind of identifiers can also identify a specific person, so they are quasi-identifiers; ip_address is sensitive for the data object, as some bad actors can use it to hack computers, so they are sensitive data.

**hospital**

| explicit identifier | quasi identifiers | sensitive data | other |
| ------ | ------ | ------ | ------ |
| name | DOB | medical reason | last_food |
| lastname | 


**Explaination:** For the hospital dataset, name, last name are unique for a specific person, and we can identify the person by one of the two identifiers, so they are explicit identifiers; DOB is not unique for a specific person, but enough of these kind of identifiers can also identify a specific person, so it is quasi-identifier; medical reason is the privacy of the data object, but useful for the researchers, so it is sensitive data; last_food is a both non-identifier and non-sensitive attribute, so it is classified into other.

**creditcard**

| explicit identifier | quasi identifiers | sensitive data | other |
| ------ | ------ | ------ | ------ |
| name | DOB | credit_number |  |
| lastname | postal_code | credit_provider |
|  | | credit_security_code |


**Explaination:** For the creditcard dataset, name, last name are unique for a specific person, and we can identify the person by one of the two identifiers, so they are explicit identifiers; DOB and postal_code are not unique for a specific person, but enough of these kind of identifiers can also identify a specific person, so they are quasi-identifiers; credit_number, credit_provider and credit_security_code are very sensitive for the data object, and will cause loss for the data object if leaked, but it is useful for the researchers, so they are sensitive data.

#### anonymize data by removing explicit identifiers for each dataset

In [220]:
# dropping explicit identifiers(passed columns)
income.drop(["name", "lastname", "ID"], axis = 1, inplace = True)
income

Unnamed: 0,DOB,postal_code,color,companies,income
0,1990-10-13,92310,DarkBlue,Joseph-Burns,120000
1,2000-03-21,73196,SeaShell,Nguyen PLC,70000
2,1992-03-19,86372,Bisque,Byrd-Walton,223546
3,1945-04-02,19557,LightYellow,Pena Group,62345
4,1983-11-25,94306,DarkSlateBlue,Schneider Inc,146098
5,1951-02-14,29648,LightBlue,Ferguson Group,56000
6,1949-02-24,10124,Fuchsia,"Martin, Alvarez and Young",231456
7,1947-01-31,78788,OrangeRed,"Burns, Michael and Collins",210900
8,1958-10-26,77075,MediumAquaMarine,"Miller, Hanson and Roberts",93567
9,1983-12-17,82698,IndianRed,Freeman-Perry,90000


In [238]:
# dropping passed columns
ip.drop(["name", "lastname"], axis = 1, inplace = True)
ip

Unnamed: 0,DOB,ip_address,location
0,1990-10-13,192.0.8.93,"('53.7446', '-0.33525', 'Kingston upon Hull', ..."
1,2000-03-21,203.48.10.235,"('48.73218', '11.18709', 'Neuburg an der Donau..."
2,1992-03-19,198.51.98.53,"('35.06544', '1.04945', 'Frenda', 'DZ', 'Afric..."
3,1945-04-02,192.160.182.167,"('35.85', '117.7', 'Dongdu', 'CN', 'Asia/Shang..."
4,1983-11-25,213.43.91.75,"('32.05971', '34.8732', 'Ganei Tikva', 'IL', '..."
5,1951-02-14,192.29.160.209,"('-20.87306', '-48.29694', 'Viradouro', 'BR', ..."
6,1949-02-24,198.51.2.188,"('22.37066', '114.10479', 'Tsuen Wan', 'HK', '..."
7,1947-01-31,198.58.178.92,"('48.52961', '12.16179', 'Landshut', 'DE', 'Eu..."
8,1958-10-26,203.3.238.205,"('48.07667', '8.64409', 'Trossingen', 'DE', 'E..."
9,1983-12-17,192.52.207.100,"('38.37255', '34.02537', 'Aksaray', 'TR', 'Eur..."


In [163]:
# dropping passed columns
hospital.drop(["name", "lastname"], axis = 1, inplace = True)
hospital

Unnamed: 0,DOB,last_food,medical reason
0,1990-10-13,banana,back pain
1,2000-03-21,apple,flue
2,1992-03-19,steak,vomiting
3,1945-04-02,coffee,fever
4,1983-11-25,mocha,cancer
5,1951-02-14,strawberry,cold
6,1949-02-24,apple,knee problem
7,1947-01-31,gala,accident
8,1958-10-26,chicken,flue
9,1983-12-17,chickenpie,injury


In [221]:
# dropping passed columns
creditcard.drop(["name", "lastname"], axis = 1, inplace = True)
creditcard

Unnamed: 0,DOB,postal_code,credit_number,credit_provider,credit_security_code
0,1990-10-13,92310,4760000000000000.0,VISA 13 digit,8
1,2000-03-21,73196,2220000000000000.0,Discover,644
2,1992-03-19,86372,373000000000000.0,JCB 16 digit,542
3,1945-04-02,19557,4.42e+18,Maestro,454
4,1983-11-25,94306,4.85e+18,JCB 16 digit,297
5,1951-02-14,29648,4.66e+18,American Express,188
6,1949-02-24,10124,4450000000000000.0,VISA 16 digit,565
7,1947-01-31,78788,36300000000000.0,American Express,76
8,1958-10-26,77075,4010000000000.0,JCB 16 digit,445
9,1983-12-17,82698,3510000000000000.0,VISA 19 digit,368


### Re-identification by linking
Try to link the records from the datasets and re-identify the records. Notice that you might only get matching information about a record not specifically identify the individuals.


In [165]:
income_creditcard = pd.merge(income, creditcard, how="outer", on=["DOB", "postal_code"])

In [166]:
income_creditcard

Unnamed: 0,DOB,postal_code,color,companies,income,credit_number,credit_provider,credit_security_code
0,1990-10-13,92310,DarkBlue,Joseph-Burns,120000,4760000000000000.0,VISA 13 digit,8
1,2000-03-21,73196,SeaShell,Nguyen PLC,70000,2220000000000000.0,Discover,644
2,1992-03-19,86372,Bisque,Byrd-Walton,223546,373000000000000.0,JCB 16 digit,542
3,1945-04-02,19557,LightYellow,Pena Group,62345,4.42e+18,Maestro,454
4,1983-11-25,94306,DarkSlateBlue,Schneider Inc,146098,4.85e+18,JCB 16 digit,297
5,1951-02-14,29648,LightBlue,Ferguson Group,56000,4.66e+18,American Express,188
6,1949-02-24,10124,Fuchsia,"Martin, Alvarez and Young",231456,4450000000000000.0,VISA 16 digit,565
7,1947-01-31,78788,OrangeRed,"Burns, Michael and Collins",210900,36300000000000.0,American Express,76
8,1958-10-26,77075,MediumAquaMarine,"Miller, Hanson and Roberts",93567,4010000000000.0,JCB 16 digit,445
9,1983-12-17,82698,IndianRed,Freeman-Perry,90000,3510000000000000.0,VISA 19 digit,368


In [167]:
ip_hospital = pd.merge(ip, hospital, how="outer", on=["DOB"])

In [168]:
ip_hospital

Unnamed: 0,DOB,ip_address,location,last_food,medical reason
0,1990-10-13,192.0.8.93,"('53.7446', '-0.33525', 'Kingston upon Hull', ...",banana,back pain
1,2000-03-21,203.48.10.235,"('48.73218', '11.18709', 'Neuburg an der Donau...",apple,flue
2,1992-03-19,198.51.98.53,"('35.06544', '1.04945', 'Frenda', 'DZ', 'Afric...",steak,vomiting
3,1945-04-02,192.160.182.167,"('35.85', '117.7', 'Dongdu', 'CN', 'Asia/Shang...",coffee,fever
4,1983-11-25,213.43.91.75,"('32.05971', '34.8732', 'Ganei Tikva', 'IL', '...",mocha,cancer
5,1951-02-14,192.29.160.209,"('-20.87306', '-48.29694', 'Viradouro', 'BR', ...",strawberry,cold
6,1949-02-24,198.51.2.188,"('22.37066', '114.10479', 'Tsuen Wan', 'HK', '...",apple,knee problem
7,1947-01-31,198.58.178.92,"('48.52961', '12.16179', 'Landshut', 'DE', 'Eu...",gala,accident
8,1958-10-26,203.3.238.205,"('48.07667', '8.64409', 'Trossingen', 'DE', 'E...",chicken,flue
9,1983-12-17,192.52.207.100,"('38.37255', '34.02537', 'Aksaray', 'TR', 'Eur...",chickenpie,injury


In [169]:
all_connect = pd.merge(income_creditcard, ip_hospital, how="outer", on=["DOB"])

In [170]:
all_connect

Unnamed: 0,DOB,postal_code,color,companies,income,credit_number,credit_provider,credit_security_code,ip_address,location,last_food,medical reason
0,1990-10-13,92310,DarkBlue,Joseph-Burns,120000,4760000000000000.0,VISA 13 digit,8,192.0.8.93,"('53.7446', '-0.33525', 'Kingston upon Hull', ...",banana,back pain
1,2000-03-21,73196,SeaShell,Nguyen PLC,70000,2220000000000000.0,Discover,644,203.48.10.235,"('48.73218', '11.18709', 'Neuburg an der Donau...",apple,flue
2,1992-03-19,86372,Bisque,Byrd-Walton,223546,373000000000000.0,JCB 16 digit,542,198.51.98.53,"('35.06544', '1.04945', 'Frenda', 'DZ', 'Afric...",steak,vomiting
3,1945-04-02,19557,LightYellow,Pena Group,62345,4.42e+18,Maestro,454,192.160.182.167,"('35.85', '117.7', 'Dongdu', 'CN', 'Asia/Shang...",coffee,fever
4,1983-11-25,94306,DarkSlateBlue,Schneider Inc,146098,4.85e+18,JCB 16 digit,297,213.43.91.75,"('32.05971', '34.8732', 'Ganei Tikva', 'IL', '...",mocha,cancer
5,1951-02-14,29648,LightBlue,Ferguson Group,56000,4.66e+18,American Express,188,192.29.160.209,"('-20.87306', '-48.29694', 'Viradouro', 'BR', ...",strawberry,cold
6,1949-02-24,10124,Fuchsia,"Martin, Alvarez and Young",231456,4450000000000000.0,VISA 16 digit,565,198.51.2.188,"('22.37066', '114.10479', 'Tsuen Wan', 'HK', '...",apple,knee problem
7,1947-01-31,78788,OrangeRed,"Burns, Michael and Collins",210900,36300000000000.0,American Express,76,198.58.178.92,"('48.52961', '12.16179', 'Landshut', 'DE', 'Eu...",gala,accident
8,1958-10-26,77075,MediumAquaMarine,"Miller, Hanson and Roberts",93567,4010000000000.0,JCB 16 digit,445,203.3.238.205,"('48.07667', '8.64409', 'Trossingen', 'DE', 'E...",chicken,flue
9,1983-12-17,82698,IndianRed,Freeman-Perry,90000,3510000000000000.0,VISA 19 digit,368,192.52.207.100,"('38.37255', '34.02537', 'Aksaray', 'TR', 'Eur...",chickenpie,injury


### Anonymize 
Anonymize the income and credit card datasets. Use Generalization or Supression methods on postal code. 

In [222]:
from datetime import datetime

def G_postal_code(Dframe):
    if Dframe["postal_code"]<40000:
        return "10000-39999"
    else:
        return "40000-99999"

def G_DOB(Dframe):
    if Dframe["DOB"]>=1980:
        return "1980-2000"
    else:
        return "1900-1970"

In [223]:
income["postal_code"] = income.apply(G_postal_code, axis=1)

In [224]:
income

Unnamed: 0,DOB,postal_code,color,companies,income
0,1990-10-13,40000-99999,DarkBlue,Joseph-Burns,120000
1,2000-03-21,40000-99999,SeaShell,Nguyen PLC,70000
2,1992-03-19,40000-99999,Bisque,Byrd-Walton,223546
3,1945-04-02,10000-39999,LightYellow,Pena Group,62345
4,1983-11-25,40000-99999,DarkSlateBlue,Schneider Inc,146098
5,1951-02-14,10000-39999,LightBlue,Ferguson Group,56000
6,1949-02-24,10000-39999,Fuchsia,"Martin, Alvarez and Young",231456
7,1947-01-31,40000-99999,OrangeRed,"Burns, Michael and Collins",210900
8,1958-10-26,40000-99999,MediumAquaMarine,"Miller, Hanson and Roberts",93567
9,1983-12-17,40000-99999,IndianRed,Freeman-Perry,90000


In [225]:
income["DOB"] = pd.to_datetime(income["DOB"]).apply(lambda x: x.strftime('%Y')).astype(int)
income["DOB"] = income.apply(G_DOB, axis=1)

In [226]:
income

Unnamed: 0,DOB,postal_code,color,companies,income
0,1980-2000,40000-99999,DarkBlue,Joseph-Burns,120000
1,1980-2000,40000-99999,SeaShell,Nguyen PLC,70000
2,1980-2000,40000-99999,Bisque,Byrd-Walton,223546
3,1900-1970,10000-39999,LightYellow,Pena Group,62345
4,1980-2000,40000-99999,DarkSlateBlue,Schneider Inc,146098
5,1900-1970,10000-39999,LightBlue,Ferguson Group,56000
6,1900-1970,10000-39999,Fuchsia,"Martin, Alvarez and Young",231456
7,1900-1970,40000-99999,OrangeRed,"Burns, Michael and Collins",210900
8,1900-1970,40000-99999,MediumAquaMarine,"Miller, Hanson and Roberts",93567
9,1980-2000,40000-99999,IndianRed,Freeman-Perry,90000


In [214]:
income[["DOB","postal_code"]].value_counts()

DOB        postal_code
1900-1970  10000-39999    6
1980-2000  10000-39999    5
           40000-99999    5
1900-1970  40000-99999    4
dtype: int64

In [227]:
creditcard['postal_code'] = creditcard.apply(G_postal_code, axis=1)

In [228]:
creditcard

Unnamed: 0,DOB,postal_code,credit_number,credit_provider,credit_security_code
0,1990-10-13,40000-99999,4760000000000000.0,VISA 13 digit,8
1,2000-03-21,40000-99999,2220000000000000.0,Discover,644
2,1992-03-19,40000-99999,373000000000000.0,JCB 16 digit,542
3,1945-04-02,10000-39999,4.42e+18,Maestro,454
4,1983-11-25,40000-99999,4.85e+18,JCB 16 digit,297
5,1951-02-14,10000-39999,4.66e+18,American Express,188
6,1949-02-24,10000-39999,4450000000000000.0,VISA 16 digit,565
7,1947-01-31,40000-99999,36300000000000.0,American Express,76
8,1958-10-26,40000-99999,4010000000000.0,JCB 16 digit,445
9,1983-12-17,40000-99999,3510000000000000.0,VISA 19 digit,368


In [229]:
creditcard["DOB"] = pd.to_datetime(creditcard["DOB"]).apply(lambda x: x.strftime('%Y')).astype(int)
creditcard["DOB"] = creditcard.apply(G_DOB, axis=1)

In [230]:
creditcard

Unnamed: 0,DOB,postal_code,credit_number,credit_provider,credit_security_code
0,1980-2000,40000-99999,4760000000000000.0,VISA 13 digit,8
1,1980-2000,40000-99999,2220000000000000.0,Discover,644
2,1980-2000,40000-99999,373000000000000.0,JCB 16 digit,542
3,1900-1970,10000-39999,4.42e+18,Maestro,454
4,1980-2000,40000-99999,4.85e+18,JCB 16 digit,297
5,1900-1970,10000-39999,4.66e+18,American Express,188
6,1900-1970,10000-39999,4450000000000000.0,VISA 16 digit,565
7,1900-1970,40000-99999,36300000000000.0,American Express,76
8,1900-1970,40000-99999,4010000000000.0,JCB 16 digit,445
9,1980-2000,40000-99999,3510000000000000.0,VISA 19 digit,368


In [231]:
creditcard[["DOB","postal_code"]].value_counts()

DOB        postal_code
1900-1970  10000-39999    6
1980-2000  10000-39999    5
           40000-99999    5
1900-1970  40000-99999    4
dtype: int64

#### Question: Is it k-anonymized? 
What is the maximum k that you can make each of the credit car or income datasets k-anonymized?


_They(income, creditcard) are both 4-anonymized. k=4 is what I can make most._

#### Question: Does it need l-diversity?

_Both income and creditcard datasets do not need l-diversity again. Because they are already l-diversity due to their unique values of sensitive attributes values in each qid group. For income dataset, the sensitive attributes are "company" and "income". In each qid group of income dataset, the names of "company" are different, and the values of "income" are different and distributed broadly. The sensitive information of a person cannot be determined in the income qid groups. If a attacker has known one person in some qid group, he/she can only have 1/groupsize possibilty to determine the person in some company and with some number of income. Things work the same as that in the creditcard dataset._

### Try relocating the credit cards
Try finding out the location of the credit card holders by linking the dataset to the ip dataset. What do you find?

In [239]:
ip["DOB"] = pd.to_datetime(ip["DOB"]).apply(lambda x: x.strftime('%Y')).astype(int)
ip["DOB"] = ip.apply(G_DOB, axis=1)

In [240]:
ip


Unnamed: 0,DOB,ip_address,location
0,1980-2000,192.0.8.93,"('53.7446', '-0.33525', 'Kingston upon Hull', ..."
1,1980-2000,203.48.10.235,"('48.73218', '11.18709', 'Neuburg an der Donau..."
2,1980-2000,198.51.98.53,"('35.06544', '1.04945', 'Frenda', 'DZ', 'Afric..."
3,1900-1970,192.160.182.167,"('35.85', '117.7', 'Dongdu', 'CN', 'Asia/Shang..."
4,1980-2000,213.43.91.75,"('32.05971', '34.8732', 'Ganei Tikva', 'IL', '..."
5,1900-1970,192.29.160.209,"('-20.87306', '-48.29694', 'Viradouro', 'BR', ..."
6,1900-1970,198.51.2.188,"('22.37066', '114.10479', 'Tsuen Wan', 'HK', '..."
7,1900-1970,198.58.178.92,"('48.52961', '12.16179', 'Landshut', 'DE', 'Eu..."
8,1900-1970,203.3.238.205,"('48.07667', '8.64409', 'Trossingen', 'DE', 'E..."
9,1980-2000,192.52.207.100,"('38.37255', '34.02537', 'Aksaray', 'TR', 'Eur..."


In [241]:
reloc_credit_ip = pd.merge(creditcard, ip, how="outer", on=["DOB"])

In [242]:
reloc_credit_ip

Unnamed: 0,DOB,postal_code,credit_number,credit_provider,credit_security_code,ip_address,location
0,1980-2000,40000-99999,4.760000e+15,VISA 13 digit,8,192.0.8.93,"('53.7446', '-0.33525', 'Kingston upon Hull', ..."
1,1980-2000,40000-99999,4.760000e+15,VISA 13 digit,8,203.48.10.235,"('48.73218', '11.18709', 'Neuburg an der Donau..."
2,1980-2000,40000-99999,4.760000e+15,VISA 13 digit,8,198.51.98.53,"('35.06544', '1.04945', 'Frenda', 'DZ', 'Afric..."
3,1980-2000,40000-99999,4.760000e+15,VISA 13 digit,8,213.43.91.75,"('32.05971', '34.8732', 'Ganei Tikva', 'IL', '..."
4,1980-2000,40000-99999,4.760000e+15,VISA 13 digit,8,192.52.207.100,"('38.37255', '34.02537', 'Aksaray', 'TR', 'Eur..."
...,...,...,...,...,...,...,...
195,1900-1970,40000-99999,4.270000e+15,VISA 19 digit,845,100.38.177.193,"('7.6', '4.18333', 'Olupona', 'NG', 'Africa/La..."
196,1900-1970,40000-99999,4.270000e+15,VISA 19 digit,845,203.16.148.93,"('34.75856', '136.13108', 'Ueno-ebisumachi', '..."
197,1900-1970,40000-99999,4.270000e+15,VISA 19 digit,845,192.31.67.82,"('0.46005', '34.11169', 'Busia', 'KE', 'Africa..."
198,1900-1970,40000-99999,4.270000e+15,VISA 19 digit,845,192.58.175.42,"('34.06635', '-84.67837', 'Acworth', 'US', 'Am..."


_I find after anonymization, for one credit card record, there are 10 locations correspongding to it. So we can not determine its location._
