---

_You are currently looking at **version 1.1** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-machine-learning/resources/bANLa) course resource._

---

## Understanding and Predicting Property Maintenance Fines

This project is based on a data challenge from the Michigan Data Science Team ([MDST](http://midas.umich.edu/mdst/)). 
The task is to predict whether a given blight ticket will be paid on time.

[Blight violations](http://www.detroitmi.gov/How-Do-I/Report/Blight-Complaint-FAQs) are issued by the city to individuals who allow their properties to remain in a deteriorated condition. Every year, the city of Detroit issues millions of dollars in fines to residents and every year, many of these fines remain unpaid. Enforcing unpaid blight fines is a costly and tedious process, so the city wants to know: how can we increase blight ticket compliance?

The first step in answering this question is understanding when and why a resident might fail to comply with a blight ticket. This is where predictive modeling comes in. 
All data for this assignment has been provided  through the [Detroit Open Data Portal](https://data.detroitmi.gov/).

___

Two data files for use in training and validating the models are train.csv and test.csv. Each row in these two files corresponds to a single blight ticket, and includes information about when, why, and to whom each ticket was issued. The target variable is compliance, which is True if the ticket was paid early, on time, or within one month of the hearing data, False if the ticket was paid after the hearing date or not at all, and Null if the violator was found not responsible. Compliance, as well as a handful of other variables that will not be available at test-time, are only included in train.csv.

Note: All tickets where the violators were found not responsible are not considered during evaluation. They are included in the training set as an additional source of data for visualization, and to enable unsupervised and semi-supervised approaches. However, they are not included in the test set.

<br>

**File descriptions** (Use only this data for training your model!)

    readonly/train.csv - the training set (all tickets issued 2004-2011)
    readonly/test.csv - the test set (all tickets issued 2012-2016)
    readonly/addresses.csv & readonly/latlons.csv - mapping from ticket id to addresses, and from addresses to lat/lon coordinates. 
     Note: misspelled addresses may be incorrectly geolocated.

<br>

**Data fields**

train.csv & test.csv

    ticket_id - unique identifier for tickets
    agency_name - Agency that issued the ticket
    inspector_name - Name of inspector that issued the ticket
    violator_name - Name of the person/organization that the ticket was issued to
    violation_street_number, violation_street_name, violation_zip_code - Address where the violation occurred
    mailing_address_str_number, mailing_address_str_name, city, state, zip_code, non_us_str_code, country - Mailing address of the violator
    ticket_issued_date - Date and time the ticket was issued
    hearing_date - Date and time the violator's hearing was scheduled
    violation_code, violation_description - Type of violation
    disposition - Judgment and judgement type
    fine_amount - Violation fine amount, excluding fees
    admin_fee - $20 fee assigned to responsible judgments
state_fee - $10 fee assigned to responsible judgments
    late_fee - 10% fee assigned to responsible judgments
    discount_amount - discount applied, if any
    clean_up_cost - DPW clean-up or graffiti removal cost
    judgment_amount - Sum of all fines and fees
    grafitti_status - Flag for graffiti violations
    
train.csv only

    payment_amount - Amount paid, if any
    payment_date - Date payment was made, if it was received
    payment_status - Current payment status as of Feb 1 2017
    balance_due - Fines and fees still owed
    collection_status - Flag for payments in collections
    compliance [target variable for prediction] 
     Null = Not responsible
     0 = Responsible, non-compliant
     1 = Responsible, compliant
    compliance_detail - More information on why each ticket was marked compliant or non-compliant


___

## Evaluation

This model's predictions are given as the probability that the corresponding blight ticket will be paid on time.


___

A function  training a model to predict blight ticket compliance in Detroit using `readonly/train.csv` is created. Using this model, a series of length 61001 is returned with the data being the probability that each corresponding ticket from `readonly/test.csv` will be paid, and the index being the ticket_id.
       


In [7]:
import pandas as pd
import numpy as np
  


def blight_model():
    from sklearn.model_selection import train_test_split,GridSearchCV
    from sklearn.metrics import roc_auc_score, roc_curve, auc
    from sklearn.ensemble import GradientBoostingClassifier
    from sklearn.ensemble import RandomForestClassifier
    
    train = pd.read_csv('train.csv',  index_col = 'ticket_id', encoding ="ISO-8859-1")
    dftest = pd.read_csv('test.csv',index_col = 'ticket_id',encoding ="ISO-8859-1")
    geolocation = pd.read_csv('latlons.csv')
    address = pd.read_csv('addresses.csv')
    
    #merging dfs
    geoadd=pd.merge(address, geolocation, on='address')
    geoadd=geoadd.set_index('ticket_id')
    trainm=pd.merge(train,geoadd, left_index=True, right_index=True)
    
    dftrain=trainm[(trainm['compliance'] ==0) | (trainm['compliance']==1)]
    
    
    selected=dftrain.loc[dftrain['violation_code']== '61-5-21']
    lst=['agency_name','violation_code', 'violation_description', 'disposition', 'fine_amount',
       'admin_fee', 'state_fee', 'late_fee', 'discount_amount','judgment_amount','grafitti_status', 'compliance']
    business=selected[selected.violator_name.isin(['HISTORIC EBRICK CO., *', 'INVESTMENTS LLC, TEL-96',
       'CO., HISTORIC BIRCK','ATTN:DENNIS KEFALLINOS, BROOKLYN PROPERTIES INC.', 'STREET LLC, BRUSH',
       'STREET LOFTS INC, ELIZABETH', 'INVESTMENT, GREEKTOWN',
       'GARAGE LLC, GEM', 'PROPERTIES, IVORY', 'BLDG INC, CAREY',
       'DEVELOPMENT CO, PRESTIGE', 'PROPERTIES, BROOKYLN',
       'DEVELOPMENT OF MICH LLC, OLYMPIA', 'PROERTIES, IVORY',
       'PROPERTIES LLC, BROOKLYN', 'DEVELOPMENT OF MI LLC, OLYMPIA',
       'PROPERTIES INC, BROOKLYN', 'STREET LOFTS, ELIZABETH',
       'GARAGE, GEM', 'DEVELOPMENT, OLYMPIA', 'PROPERTY, BROOKLYN',
       'LOFTS INC, ELIZABETH STREET', 'LOFTS INC, ELIZABEH STREET',
       "/MCDONALD'S, SONYA SHIELD",'ENTERPRISE LLC, SALEEM ALTERA', 'OIL INC., M & A',
       'GAS MART, JOHN R', 'KFC OF AMERICA, L & A ARCHITECTS',
       'PETROLEUM, INC., H & M','GRAPHIX LLC, N V',
       'SERVICE INC, ALLIED TOWING','LLC STE  1745, PRIME PARKING', 'MANAGEMENT, TGC',
       'LDHALP KATHYS MAKINO, HELISA SQUARE', 'LLC, PRIME PARKING',
       'UNITED METHODIST CHURCH, CASS COMMUNITY', 'LLC, FIRST THIRD','ENTERPRISES, INC., LITTLE CEASAR', 'CHURCH, NORTHSIDE APOSTOLIC',
       'HOUSE , LELAND', 'CORP-M HIGGENS, WITHERELL','DDA, CITY OF DETROIT','SQUARE LDHAP, HELISA','METHODIST CHURCH, CASS COMMUNITY UNITED', 'THIRD LLC, FIRST',
        'METHODIST CHURCH, CASS COMMUNITY UNITED', 'THIRD LLC, FIRST','COLLINS III, CARL L.', 'LLC, BOULDER DEVELOPMENT',
       'UNIVERSITY AUTO, AMIN-HARB', 'LLC, WOODWARD OFFICE','COLLINS III, CARL L', 'OVERSEAS CORP, SMB',
       'C/O LA INSURANCE, HANI KASSAB', 'CURRENCY, GREAT LAKES','VAULT, YOUR PERSONAL','CURRENY, GREAT LAKES',
       'INSURANCE, L.A.',"CORP., WITHRELL","AFRICAN MARKET, BAMBA'S", 'MASTER, WEAR',"PLACE, TERRY'S", 'LLC, 3RD BAR', 'DEVELOPMENT LLC, BOULDER',
       'CHURCH, UNITY MISSION', 'NUBIAN QUEENS, AFRICAN','DEVELOPMENT, LLC, BOULDER','WARREN INVESTMENTS , ....', 'WARREN INVESTMENTS INC, ...',
       'WARREN INVESTMENTS, ....','OAK GROVE AME, ...', 'LLC, COLOSSAL', 'INVESTMENTS, MOGHOL', 'LOFTS LLC, TRUMBULL',
       'STUDIOS L & D, HARMONIE', 'Town, Meat', 'PROPERTIES, LAWRENCE WOLF', 'LLC, TEL-SEVEN',
       'YOUR PERSONAL VAULT, .....', 'SOUTHEASTERN BLDG GROUP LLC, ....','BUILDING GROUP LLC, SOUTHEASTERN', 'CENTER INC, MUSIC HALL',
       'PERSONAL VAULT, YOUR','Center Inc, Music Hall'])]


    person=selected[selected.violator_name.isin(['ABDOUL, MAWARI', 'SHILLCUTT, VINCENT', 'JOHNSON, ERIC',
       'TATARIAN, GREGORY', 'WILLIAMS, COLLETTE', 'MOHAMMAD, SAMIR', 'PATTAH, JERRY',
       'BASHI, DOLOR', 'AZZOU, MAJID','ZAMOSKY, RALPH','AJROUCHE, ATTA H', 'KEVERSON, MILTON L', 'KEVRESON, MILTON L','FRANKLIN, CATHERINE', 
        'CHOLAK, RICHARD','YOUSIF, MIKE', 'NAJOR, HANI','ABOO, ROBERT', 'Yousif, Mike','COPPERSMITH, JAY','WEAKS, CHRIS', 'daoud, emmanuel','DAVIES, KENNETH','POKUAAH, SQUITTER', 'Mosley, William',
        'SHANGO, MASOUD','MOSLEY, WILLIAM', 'HART SR, CALVIN', 'JERKINS, WAYMON','MOSLEY, WILLIAM J', 'MOSELY, WILLIAM J', 'DUNN, BARBARA',
       'ZABON, EDDIE', 'EARLY, DAISY','FLOWERS, MARVIN', 'ZABEN, EDDIE','WEINSTEIN, ARON',
       'Ahmad, Mustapha', 'BOGHOSSIAN, NAZAR','LOCKHART, CARNELL', 'AMMORI, FAROOK' ])]


    business['violation_code'].replace('61-5-21', '61-5-211',inplace=True)        
    person['violation_code'].replace('61-5-21', '61-5-212',inplace=True)

    subset521=business.append(person)
    subset521=subset521[lst]
    ddtrain=dftrain.loc[dftrain['violation_code']!= '61-5-21'].copy() 
    dtrain=ddtrain.append(subset521)
    dtrain=dtrain[lst]

    # Test set

    selec21=dftest.loc[dftest['violation_code']== '61-5-21']
    selec21=selec21[['agency_name',
     'violation_code', 'violation_description','violator_name', 'disposition','fine_amount','admin_fee',
     'state_fee','late_fee','discount_amount','judgment_amount','grafitti_status']]
    lnd=['agency_name','violation_code','violation_description','disposition', 'fine_amount',
       'admin_fee', 'state_fee', 'late_fee', 'discount_amount','judgment_amount', 'grafitti_status']
    ddtest=dftest[lnd]
    businesst=selec21[selec21.violator_name.isin(['LLC, MMJM,', 'VAULT, YOUR PERSONAL',
       'COMPANY LLC, CAMPUS VILLAGE HOLDING', 'CENTER, INC, MUSIC HALL',
       'CENTER, INC., MUSIC HALL', 'CENTER INC, MUSIC HALL','Center Inc, Music Hall', "MARILYN'S PLACE, .",
       'NEW MILLENIUM MISS. BAPT. CHUR, .','Group, Outer Drive', "EL Lynn's Kitchen, .",'GROUP, FARBMAN',
       'ATTN DAVID ROTHENSTEIN, BURGER KING CORP #10287',
       'ATTN KHALIDA KAJY, SNOWDEN PROPERTIES LLC',
       'ST. JOHNS REDEMPTION MISSION', 'PRECISE CROF LLC','Pennington Group, LLC',
       'Executive Inn of Michigan, LLc/AKa The City Airport Inn/Baljit S. Bains',
       'S & Y Gratiot, LLC', '12301 Gratiot, LLC',
       'Alvin Ribiat Property, LLC', 'Property Venture, LLC',
       'Oram, Jason', 'Gratiot-Novara Investment, LLC','Digital Product TV Promotion & Management Group, LLC',
       'Sunny Horizons Lending, LLC','Michigan Central, LLC','Salvation Properties, LLC', 'MLM, LLC',
       'Full Gospel House of Prayer','Istanbull, LLC', 'ANTHONY YOUSIF INV. CO. LLC','JOY RD. DETROIT LLC','OSCAR GOODWIN RENTAL CORP.', 'RENDLES DEVELOPMENT GROUP',
       '8 MILE EVERGREEN, LLC','Sylvia S. Leverett Black Bottom Transportation','Larot Enterprises, Inc.',
       'Strather & Associates', 'Hill, Kenneth & Londa','INKSTER INVESTMENTS LLC', 
       'EJVJ Enterprises, Inc.', 'Great Faith Ministries International, Inc.','Atcom, LLC', 'Pioneer & Settler, LLC', 
       'Carson Mas, LP/Capital Automotive, LP','JN Grand River Property, LLC', 'Gjonlleshi, Pashk & Marte',
       'Al Jamil Auto Sales, LLC', 'Grand Detroit Holding, LLC','1st Choice Transportation Co.',
        'CK REALTY GROUP ESTELLA CARTER,CHHAYA K SHAW', 'Danya Investment Co.', 
       '13544 W. Grand River, LLC', 'Williams, Lawrence','Isaac Akhnana, LLC', 'CRE 14400 Grand River',
       'Detroit Leasing, Inc.','New Morning Star Baptist Church of Detroit', 'Fallena, LLC',
       'Devine & Devine', 'Powell, Georgia & Dozie','Platinum Investments Holdings, LLC', 'Vernor Group, LLC',
       '13001 W. McNichols, LLC', 'Siberian, LLC', 'Plan O, LLC','NORTHWEST DETROIT NEIGHBORHOOD DEVELOPMENT CORP',
       'Kabba, Tiran Kay', 'FMM Bushnell Great Lakes, LLC','Mitchell, Clarence', 'MJCFDS, LLC',
       'Tri-Vision, LLC', 'Dinverno, Inc.','The Works Home Improvement & Repair, Inc.', 'Tiger Jurisdiction Properties, LLC', 'Rosaco Investments, Inc.',
       'Mercedes Boulding', "Ava's World Famous, LLC",'Carrion, Selina M.','THE LIBERTY GROUP INC','Global Premier Asset Servising', 'Allicor, Inc.',
       'Design Bank, LLC', 'Triangle Management/Ralph Sachs','Church Of The Living God Christian Center', 'Citizens Bank',
       'D & J Trendell Trustees', 'VOLVO TRANSPORTATION','RAW MATERIALS LLC', 'Rosaco Investments', 
       'Cromer Enterprises, Inc./Attn:  John Cromer', 'Melton And Company, PLCC',
       'Peter James Management, LLC', 'Goldfarb Bonding Agency, LLC','Stamco Holdings, LLC/Michael Schorer',
       'M.L.M., LLC', '5TH Ave. Ventures, LLC/Attn:  Quinnon Martin III','Timothy Driscoll, Mauricio MicKam &', 'All City Investments, LLC',
       'Buy Here Pay Here Real Estate, LLC', 'Astoria Cab Leasing, LLC', 'The Garvey Trust', 'Reynaldo Godinez, Harold Yepes &',
        '7300 W. Seven Mile, LLC', 'LA Group, LLC', 'Bayview Loan Servicing, LLC','Al Jamil Auto Sales, Inc.', 'Istanbul, Inc.','Elevation Fellowship Temple',
                                             'BELIEVE IN THE LIGHT/RONDA PRATT', 'DEXTER & JOY LLC', 'JOHNSON, RALPH', 'VERNOR GROUP LLC',
       'FAMILY TRUST, THE RIDLEY IRREVOCABLE', '3431 JOY LLC','LITLLE SCHOLARS CHILD DEVELOPMENT CENTER LLC', 'OC BARNES','NEW MT CALVARY CHURCH', 'BOYDELL LLC', 'W. J. TRADING CORP.',
        'S & N REAL ESTATE','CHUNG, ANDY & JAMES', '& TWANDA D WILLIS, RUFUS C THOMAS', 'ASSET MANAGEMENT UNLIMITED LLC', 'KSM Holdings LLC',
         'JUSTIN INVESTMENTS', 'MICHIGAN CENTRAL LLC', 'OAK GROVE AFRICAN METHODIST EPISCOPAL CHURCH',
       'GRAND DETROIT HOLDING, LLC', 'ATCOM LLC', 'SIBERIAN LLC','MAIDEN HOUSE MINISTRY','TRIANGLE MANAGEMENT', 'RILEY IRREVOCABLE TRUST',
       'THE BLACKWOOD ORGANIZATION INC','MICHIGAN & LONYO ULTIMATE SERVICE','WORD OF LIFE CHRISTIAN CENTER-JEROME CAMPBELL',
       'JULIUS HOLLEY III', 'SAM & CROCIFISA MIGLIORE','THE LIBERTY GROUP', 'EVANGELISTIC HOUSE OF GOD','ALEXS DETROIT CONEY LLC', '52R LLC',
       'WARWICK EXPEDITED FRIEGHT LLC', 'TWENTIETH STREET ASSOCIATES LLC', 'AMERICAN INTERNATIONAL TRADE & AUCTION',
       '7510 MICHIGAN INVESTMENTS LLC', 'VERNOR CENTRAL LLC','GRAND DETROIT HOLDING LLC','BASCO OF MICHIGAN','SAFADI, EDMUND & JACKIE',
                                             'DETROIT LEASING INC', 'TEMPLE COMMONS, LLC CSC LAWYER','KSM HOLDINGS LLC', 'PAB INVESTMENTS INC',
       'GUNSTON FAMILY MARKET INC', 'WOODWARD LLC','7510 MICH INVESTMENTS LLC','AMERICAN INTERNATIONAL TRADE & AUCTIONS',
       'UNIVERSAL CONSTRUCTION, INC', 'ROADMASTERS TRANSPORTATION INC','KFC U. S. PROPERTIES, INC.','JEFFERSON 14272 LLC', 'INTERSTATE BUSINESS EXCHANGE',
       'Bayview Loan Servicing, REO', 'CITIZENS', 'SIGN A RAMA', 'CARROLL INSTALLATION','BRIGHT STAR INC 2011',  'MID-MICHIGAN NEON',
       'AVER SIGN CO', '555 NON PROFIT STUDIO GALLERY', 'MUSIC HALL CENTER INC', 'ST REGIS ENTERPRISE',
       'REGENCY OF MICHIGAN', 'CREW ENTERPRISES','GRATIOT MCDOUGALL HOME LLC', 'TWO 4 THE SHOW LLC', 'KHALIL BROS. INC.', 'BAGLEY ACQUISITION',
        'LYNEX INC./SCOT TURNBULL', 'BEAL PROPERTIES/STEWART BEAL', '1401 COMPANY','1301 BORADWAY LLC/BEDROCK REAL ESTATE SERVICES','1133 GRISWOLD LLC', 'BROADWAY MERCHANTS / JEAN NASH',
       'FADIA PROPERTIES LLC','L&J INVESTMENT CO. LLC', 'CADILLAC TOWER MI LLC/CO FARBMAN GROUP', 'AJ GRATIOT LLC','KNIGHT ENTERPRISE',
        'D & K INVESTMENT GROUP LLC','TRIPLE PROPERTIES DETROIT',  'YALDO & HABBO LLC', 'THE GLOBAL INVESTMENT LLC', 'VINCENT BRUGLIO, DAVID BAHRI, FADI ANTONE', 'JETCO PROPERTIES',
        '17326 E WARREN LLC', 'PASADENA INVESTMENTS', 'CLIFFORD STREET PROPERTIES', 'BAGLEY CLIFFORD, LLC / ATTN: J. BARBAT', 
       'ALPHA MANOR NURSING HOME', 'PRO CARE HEALTH PLAN','MHT PROPERTY, LLC', 'WARWICK EXPEDITED FREIGHT LLC', 'JMJ MANAGEMENT LLC',  '& NANCY FARRIGIA, GEORGE',
       'BROWN SUGAR HAIR SALON LLC', 'BELIEVE IN THE LIGHT','ALBE CONSTRUCTION', "UNCLE E'S FARM, LLC",
       'KMML PROPERTIES WCT LLC', 'VINCENT BRUGIO, DAVID BAIR, FADI ANTONE', 'WOODWARD SIX MILE PROPERTY LLC',
       '& DONNA L PARSON, HAROLD STEVEN', 'M-ONE PROPERTIES','THE VINTAGE GROUP', 'NESS BORIS INC',
       'J & B DEVELOPMENT CO LLC','BAGLEY CLIFFORD LLC','Prevention Medicine of Detroit Inc.','LAVERNUE  YEANOPOLOS & JOHN YEANOPOLOS', 'B P  OF MICHIGAN LLC', 
                         'C W Renovators', 'Nagi Investment', 'J & 19 Investments LLC', 'First Choice Agency', '8 & Gratiot Properties', 'DETROIT CITY APARTMENTS',
       "La'Konda's LLC",'BEAL PROPERTIES LLC/STEWART BEAL', '28 ASSOCIATES LLC','GEM GARAGE LLC', 'MICHIGAN OPERA HOUSE',
       'JULIAN C MADISON BUILDING/MILDRED MADISON', 'JOSSAM INC','Linsdau / Simons, Patrick & Andrew', 'BRUSH STREET LLC',
       'CROGHAN ASSOC LLC', 'GRATIOT WHOLESALERS & LIQUIDATORS LLC', "NICK'S GASLIGHT INC", '1167 E GRAND BLVD LLC',
       '1101 WASHINGTON LLC', 'ROCKY INVESTMENT CO', 'BOYDELL INC','TROLLEY PLAZA LLC', 'Skye Investments LLC', 'Livernois Development LLC',
       'Makadm Corp.'])]


    persont=selec21[selec21.violator_name.isin(['Shango, Masoud','AMMORI, FAROOK','YALDOO, ROXI','FARMER, COLLEEN J', 'Starte, LaGarte','Cannon, Doris M.',
                                            'Fakhoury, Rami Sultan', 'Weeks, Dana', 'Sancen, Guilermo', 'Ruiz, Paul W.', 'Morales, Elba',
       'Morani, Joseph A.', 'Rosenthal, Mary','Mourani, Joseph', 'BULLOCK, THOMAS', 'MOHAMED BELHAJALI', 'AMAD GARBOU', 'HANSPARD, TANIYA',
                                            'WALTER GIVINS JR', 'WILLIAMS, RICO', 'MARK MARKU',
       'Williams, Donald', 'Haddad, Victor F.', 'Brown, Harley K.','ANWUNAH, VINCENTO', 'KOZNIACKI, JAMES',
       'Infante, Robert', 'MATTHEWS, JAMES','Troy, Barbara J.', 'Lofton, David H.',  'McGowan, Tony','Attisha, Hazim','Hernandez, Gricel',
                                            'Boykin, Alvin B.','Edwards, Kirk D.', 'Samuels, Elpha  S.', 'Young, Demetris',
                                            'Yaldo, Samira','Tellis, Lugene','Lee, Jian Chao', 'Kim, Young Y.', 'Boyke, Timothy',
                                             'Posell, Georgia','Mahamat, Gouma', 'Sanchez, Jaine','Dabish, Lewis', 'Mitchell, Crystal', 'IVORY, CHRISTOPHER',
                                            'McWilliams, Yvette','Myles, Johnny','Shell, Jerome', 'Simmons, Charlotte','Robinson, Ricco',
                                           'PAIGE, GREGORY','Davis, Leonard', 'Croft Jr., Walter Frank','Hill, Adena A.',
       'Reynolds, Deaon D.','Khalil, Nahla','Hendrix, Roosevelt','Alexander, Jesse','Owe, Ayodele','Jamerson, Rothell', 'Gojcaj, Mark','Yesi, Basim',
                                           'Bryant, Ivory','Nava, Alfredo','Ateman, Kelvin B.', 'Bautista, Keurys Jose Lopez',
       'Alghaiti, Yassin', 'Jarjis, Arkan & Ghanima', 'Kozniacki, James','Srour, Ali', 'Kendrick, Charles', 'Watts, Kathleen Shah',
       'FAKHOURY, RAMI SULTAN', 'SANCEN, GUILERMO','SHELDEN STANLEY','YAN, SHO  QUAN','VIGILANTI, CAROL A.',
       'DOMINICK, JOSEPH',  'MATTISON, GILBERT D','HANO, EDDIE H','JARJIS, ISSAM',  'LEE, JIAN CHAO','ROSS, JEROME', 'ADAMS, PHYLLIS',
        'GREEN, NIKKA', 'AE BUILDING LLC', 'MURRELL, JEANNETTE  ALLEN', 'TYUS, ARTHUR L', 'SAMUELS, WALTER','RIVERA, ADELLA', 
       'GRAY, THEODDEUS', 'BAKKEN, DONALD C','CARTER, JANET', 'COLLIER, HONDA', 'BAKKEN, DONALD C.', 'WAIRE, JON', 'MOULTRIE, MARVIN L',
                           'Wittig, Alexander','CORBETT, JOHN', 'RAGSDALE JR, OLIVER','COLBERT, TODD','MURFF, EDDIS P','AWUSAH, INNOCENT K.',
       'KALE-JONES, KEN','SHILLCUT, VINCENT', 'ALI SALEH, AL MAHMODI','WOLF, HARVEY', 'MOHAMAD H, EL-HADI', 'GAY, CHESTER','BANKS, TERRY',
                                            'TATARIAN, MATTHEW', 'BROWN, WILLIE', 'Bazzi, Rabab Mutapha','BEAUPRE, KENNETH F','RODGERS, JAMES W.',
                                            'LARRY JR, STANLEY','MCALPINE, STEPHEN', 'BAZZI, RABAB MUTAPHA','RODRIGUEZ, JAMES', 'SANDERS, EDWARD L','SMITH, WILLIAM A', 'STEWART, MARLENE', 
        'WINBUSH-BEY, AUNDRE','HOLLEY, REV CHARLES JIM','CARROL, LESLIE B','RODGERS, JAMES W','Belloli, Charles', 'PATTOS, ROBERT',
                                           'Thomas Ryan P.C.','REASON, KATINA L', 'Hermez, Alvin'])]

    businesst['violation_code'].replace('61-5-21', '61-5-211', inplace=True)


    persont['violation_code'].replace('61-5-21', '61-5-212', inplace=True)


    subset21=businesst.append(persont)
    subset21=subset21[lnd]
    
    ddtestc=ddtest.loc[ddtest['violation_code']!= '61-5-21'].copy() # '61-63'

    dtest=ddtestc.append(subset21)
    dtest=dtest[lnd]
    
    tviols=dtest[['agency_name','violation_code', 'violation_description', 'disposition', 'fine_amount',
       'admin_fee', 'state_fee', 'late_fee', 'discount_amount','judgment_amount']]
    tviols['keepcat']=tviols['violation_code'].apply(lambda x: "-".join(x.split("-")[:3]) and x.split('(')[0].strip())

    tviols['keepcat'] = tviols['keepcat'].apply(lambda x: x.split('/')[0].strip())
    tviols['keepcat'] = tviols['keepcat'].apply(lambda x: x.split('.')[0].strip())
    tviols['keepcat'] = tviols['keepcat'].apply(lambda x: x.split(' -')[0].strip())
    
    tl91=tviols.loc[tviols.violation_code.str.startswith(("9-1"))]
    tl91['keepcat']=tl91['keepcat'].apply(lambda x: "-".join(x.split("-")[2:]))
    tl91s=tl91.keepcat.sort_values()
    tl91_p1=tl91.loc[tl91.keepcat.isin(("12","36","43", "45","50","81", "82","83"))] #compliance and certificates
    tl91_p2=tl91.loc[tl91.keepcat.isin(("101","102","103","104","107","108","109","111","112","113"))] 
    tl91_p3=tl91.loc[tl91.keepcat.isin(("105","353","354","355"))]#rodents
    tl91_p4=tl91.loc[tl91.keepcat.isin(("201","202","204","205","206","208","209","212","213","216","218","221"))]
    tl91_p5=tl91.loc[tl91.keepcat.isin(("301","303","304","308","309","310","311","332","405","406"))]
    tl91_p6=tl91.loc[tl91.keepcat.isin(("432","444","468","469","471"))]
    tl91_p7=tl91.loc[tl91.keepcat.isin((["110"]))] #motor vehicles
    for col_val, df in [('tl91_p1', tl91_p1),('tl91_p2', tl91_p2),('tl91_p3', tl91_p3),('tl91_p4', tl91_p4),('tl91_p5',tl91_p5),('tl91_p6',tl91_p6),('tl91_p7',tl91_p7)]:
        df['Category'] = col_val
    tl91_all = pd.concat([tl91_p1,tl91_p2,tl91_p3,tl91_p4,tl91_p5,tl91_p6,tl91_p7])
    
    tl22=tviols.loc[tviols.violation_code.str.startswith(("22-2"))]

    tl22['keepcat']=tl22['keepcat'].apply(lambda x: "-".join(x.split("-")[2:]))
    tl22s=tl22.keepcat.sort_values()
    
    tl22_p1=tl22.loc[tl22.keepcat.isin(("16","17"))] # not separating waste
    tl22_p2=tl22.loc[tl22.keepcat.isin((["18"]))] # burning waste
    tl22_p3=tl22.loc[tl22.keepcat.isin((["21"]))] # dog fowling
    tl22_p4=tl22.loc[tl22.keepcat.isin(("22","23","38","41","42","43","44","45","48","49","55","56","61"))] # container issues
    tl22_p5=tl22.loc[tl22.keepcat.isin(("83","84","85","87","88"))] # dumping waste
    tl22_p6=tl22.loc[tl22.keepcat.isin(("91","92","93","94"))] # transporting waste
    tl22_p7=tl22.loc[tl22.keepcat.isin(("96","97"))] # littering
    for col_val, df in [('tl22_p1', tl22_p1),('tl22_p2', tl22_p2),('tl22_p3', tl22_p3),('tl22_p4', tl22_p4),('tl22_p5',tl22_p5),('tl22_p6',tl22_p6),('tl22_p7',tl22_p7)]:
        df['Category'] = col_val
    tl22_all = pd.concat([tl22_p1,tl22_p2,tl22_p3,tl22_p4,tl22_p5,tl22_p6,tl22_p7])
    
    tl61=tviols.loc[tviols.violation_code.str.startswith(("61"))]
    tl61['keepcat']=tl61['keepcat'].apply(lambda x: "-".join(x.split("-")[1:]))
    tl61s=tl61.keepcat.sort_values()
    
    tl61_p1=tl61.loc[tl61.keepcat.isin(("4-32","4-33","4-35","4-37","5-14","5-18","5-19","5-20","13-102","45","47", "63"))] # permits and noncompliance
    tl61_p2=tl61.loc[tl61.keepcat.isin(("5-211","90","100","101","102","103","104","112","118","119","130"))] # non residential
    tl61_p3=tl61.loc[tl61.keepcat.isin(("5-212","8-127","8-27","8-47","14-175","81","82","83","84","85","86"))] # residential
    for col_val, df in [('tl61_p1', tl61_p1),('tl61_p2', tl61_p2),('tl61_p3', tl61_p3)]:
        df['Category'] = col_val
    tl61_all = pd.concat([tl61_p1,tl61_p2,tl61_p3])
    
    tl4=tviols.loc[tviols.violation_code.str.startswith(('19410901','19420901', '19450901', '19830901', '19840901', '19850901',
       '19910901','20130901', '20160901', '20180901'))]
    tl4s=tl4.keepcat.sort_values()
    tl4_p1=tl4.loc[tl4.keepcat.isin(("19410901","19420901","19450901"))] # abate
    tl4_p2=tl4.loc[tl4.keepcat.isin(("19830901", "19910901"))]
    tl4_p3=tl4.loc[tl4.keepcat.isin((["20130901"]))]
    tl4_p4=tl4.loc[tl4.keepcat.isin(["20180901"])]
    for col_val, df in [('tl4_p1', tl4_p1),('tl4_p2', tl4_p2),('tl4_p3', tl4_p3),('tl4_p4', tl4_p4),]:
        df['Category'] = col_val
    tl4_all = pd.concat([tl4_p1,tl4_p2,tl4_p3,tl4_p4])
    
    # Test set
    viols=dtrain[['agency_name','violation_code', 'violation_description', 'disposition', 'fine_amount',
       'admin_fee', 'state_fee', 'late_fee', 'discount_amount','judgment_amount','grafitti_status', 'compliance']]
    viols['keepcat']=viols['violation_code'].apply(lambda x: "-".join(x.split("-")[:3]) and x.split('(')[0].strip())
    viols['keepcat'] = viols['keepcat'].apply(lambda x: x.split('/')[0].strip())
    viols['keepcat'] = viols['keepcat'].apply(lambda x: x.split('.')[0].strip())
    
    fl91=viols.loc[viols.violation_code.str.startswith(("9-1"))]
    fl91['keepcat']=fl91['keepcat'].apply(lambda x: "-".join(x.split("-")[:3]))
    fl91['keepcat']=fl91['keepcat'].apply(lambda x: "-".join(x.split("-")[2:]))
    fl91["keepcat"] = pd.to_numeric(fl91["keepcat"])
    fl91s=fl91.keepcat.sort_values()
    fl91_p1=fl91.loc[fl91.keepcat.isin(("12","36","43", "45","46","50","81","82","83"))] #compliance and certificates
    fl91_p2=fl91.loc[fl91.keepcat.isin(("101","102","103","104","106","107","108","109","111","112","113"))] 
    fl91_p3=fl91.loc[fl91.keepcat.isin(("105","351","352","353","354","355"))]#rodents
    fl91_p4=fl91.loc[fl91.keepcat.isin(("201","202","203","204","205","206","207","208","209","210","211","212","213","214","215","216","219","220","221"))]
    fl91_p5=fl91.loc[fl91.keepcat.isin(("301","302","303","304","305","306","307","308","309","310","311","331","332","333","375","377","381","405","406"))]
    fl91_p6=fl91.loc[fl91.keepcat.isin(("432","433","434","439","440","441","442","443","444","462","464","465","468","469","471","474","476","477","478","479"))]
    fl91_p7=fl91.loc[fl91.keepcat.isin((["110"]))] #motor vehicles
    fl91_p8=fl91.loc[fl91.keepcat.isin(("502","503"))] #elevators
    for col_val, df in [('fl91_p1', fl91_p1),('fl91_p2', fl91_p2),('fl91_p3', fl91_p3),('fl91_p4', fl91_p4),('fl91_p5',fl91_p5),('fl91_p6',fl91_p6),('fl91_p7',fl91_p7),('fl91_p8',fl91_p8)]:
        df['Category'] = col_val
    fl91_all = pd.concat([fl91_p1,fl91_p2,fl91_p3,fl91_p4,fl91_p5,fl91_p6,fl91_p7,fl91_p8])
    
    fl22=viols.loc[viols.violation_code.str.startswith(("22-2"))]
    fl22['keepcat']=fl22['keepcat'].apply(lambda x: "-".join(x.split("-")[2:]))
    fl22['keepcat'] = fl22['keepcat'].apply(lambda x: x.strip()[:2])
    fl22["keepcat"] = pd.to_numeric(fl22["keepcat"])
    fl22s=fl22.keepcat.sort_values()
    fl22_p1=fl22.loc[fl22.keepcat.isin(("16","17"))] # not separating waste
    fl22_p2=fl22.loc[fl22.keepcat.isin(("18","19","20"))] # burning waste
    fl22_p3=fl22.loc[fl22.keepcat.isin((["21"]))] # dog fowling
    fl22_p4=fl22.loc[fl22.keepcat.isin(("22","23","38","41","42","43","44","45","49","53","55","56","61"))] # container issues
    fl22_p5=fl22.loc[fl22.keepcat.isin(("83","84","85","87","88"))] # dumping waste
    fl22_p6=fl22.loc[fl22.keepcat.isin(("91","93","94"))] # transporting waste
    fl22_p7=fl22.loc[fl22.keepcat.isin(("96","97"))] # littering
    for col_val, df in [('fl22_p1', fl22_p1),('fl22_p2', fl22_p2),('fl22_p3', fl22_p3),('fl22_p4', fl22_p4),('fl22_p5',fl22_p5),('fl22_p6',fl22_p6),('fl22_p7',fl22_p7)]:
        df['Category'] = col_val
    fl22_all = pd.concat([fl22_p1,fl22_p2,fl22_p3,fl22_p4,fl22_p5,fl22_p6,fl22_p7])
    
    fl61=viols.loc[viols.violation_code.str.startswith(("61"))]
    fl61['keepcat'] = fl61['keepcat'].apply(lambda x: x.strip()[3:])
    fl61s=fl61.keepcat.sort_values(ascending=True)
    fl61_p1=fl61.loc[fl61.keepcat.isin(("4-32","4-33","4-35","4-37","5-18","5-19","45","47", "63"))] # permits and noncompliance
    fl61_p2=fl61.loc[fl61.keepcat.isin(("5-211","90","101","104","111","114","118","120","121","130"))] # non residential
    fl61_p3=fl61.loc[fl61.keepcat.isin(("5-212","8-127","8-27","8-47","14-175","14-176","14-452","80","81","82","83","84","86"))] # residential
    for col_val, df in [('fl61_p1', fl61_p1),('fl61_p2', fl61_p2),('fl61_p3', fl61_p3)]:
        df['Category'] = col_val
    fl61_all = pd.concat([fl61_p1,fl61_p2,fl61_p3])
    fl4=viols.loc[viols.violation_code.str.startswith(('19420901', '19450901', '19830901', '19840901', '19850901',
       '20130901', '20160901', '20180901'))]
    fl4s=fl4.keepcat.sort_values()
    fl4_p1=fl4.loc[fl4.keepcat.isin(("19420901", "19450901"))] # emergency
    fl4_p2=fl4.loc[fl4.keepcat.isin(("19830901", "19840901","19850901"))]
    fl4_p3=fl4.loc[fl4.keepcat.isin(("20130901", "20160901"))]
    fl4_p4=fl4.loc[fl4.keepcat.isin(["20180901"])]
    for col_val, df in [('fl4_p1', fl4_p1),('fl4_p2', fl4_p2),('fl4_p3', fl4_p3),('fl4_p4', fl4_p4),]:
        df['Category'] = col_val
    fl4_all = pd.concat([fl4_p1,fl4_p2,fl4_p3,fl4_p4])
    
    traindf=pd.concat([fl91_all,fl22_all,fl61_all,fl4_all])
    
    # Train and Test sets.
    traindf=traindf[['agency_name','Category', 'disposition', 'fine_amount','late_fee', 'judgment_amount', 'compliance']]
    testdf=pd.concat([tl91_all,tl22_all,tl61_all,tl4_all])

    testdf=testdf[['agency_name','Category', 'disposition', 'fine_amount', 'late_fee', 'judgment_amount']]
    
    traindf["Category"] = traindf["Category"].astype('category')
    testdf["Category"] = testdf["Category"].astype('category')
    
    testdf['disposition'] = testdf['disposition'].replace('Responsible - Compl/Adj by Default', 'Responsible by Default')
    testdf['disposition'] = testdf['disposition'].replace('Responsible (Fine Waived) by Admis', 'Responsible by Admission')
    testdf['disposition'] = testdf['disposition'].replace('Responsible - Compl/Adj by Determi', 'Responsible by Determination')
    testdf['disposition'] = testdf['disposition'].replace('Responsible by Dismissal', 'Responsible (Fine Waived) by Deter')
    traindf.disposition = traindf.disposition.astype('category')
    testdf.disposition = testdf.disposition.astype('category')
    
    traindf.agency_name = traindf.agency_name.astype('category')
    testdf.agency_name = testdf.agency_name.astype('category')
    
    # Using dummies    
       
    d_traindf=pd.get_dummies(traindf)
    d_testdf=pd.get_dummies(testdf)
    
    d_newtest=d_testdf.reindex(columns = d_traindf.columns, fill_value=0)
    d_ntest=d_newtest.drop('compliance', axis='columns')# new test set
    
    # GridSearch best estimators and modelling
    X_trainb = d_traindf.drop('compliance', axis='columns')
    y_trainb = d_traindf['compliance']
    X_testb = d_ntest
    
    X_train, X_test, y_train, y_test = train_test_split(X_trainb, y_trainb,test_size=0.30, random_state=4) 
    grc= GradientBoostingClassifier(learning_rate = 0.1, max_depth = 3, n_estimators=200).fit(X_train, y_train)
    
    y_pred_grd = grc.predict_proba(d_ntest)[:, 1]
    fpr_grc, tpr_grc, _ = roc_curve(y_test, y_pred_grd)
    roc_auc_grc = auc(fpr_grc, tpr_grc)
    
    y_pred_grc = grc.predict_proba(X_testb)[:, 1]


    dftest['compliance'] = y_pred_grc.astype(np.float32)
    
    clf = RandomForestClassifier(n_estimators = 350, max_depth=6, max_features = None, random_state = 4).fit(X_train, y_train)
    
    y_pred_clf = clf.predict_proba(d_ntest)[:, 1].astype(np.float32)

    dftest['compliance'] = y_pred_clf
    dftest['compliance'] = y_pred_grc

    ans= pd.Series(dftest.compliance)
    # ROC Curve
    #y_pred_grc = grc.predict_proba(X_test)[:, 1]
   # fpr_grc, tpr_grc, _ = roc_curve(y_test, y_pred_grc)
    #roc_auc_grc = auc(fpr_grc, tpr_grc)
    
    return ans.sort_values()



import warnings
warnings.filterwarnings('ignore')

blight_model()

ticket_id
376360    0.046816
376323    0.046816
376340    0.046816
376191    0.046816
376190    0.046816
374953    0.046816
374958    0.046816
376223    0.046816
376230    0.046816
376361    0.046816
376221    0.046816
376358    0.046816
376359    0.046816
376227    0.046816
376226    0.046816
376276    0.046816
376218    0.046816
376368    0.046816
376012    0.046816
376187    0.046816
376104    0.046816
376247    0.046816
375893    0.046816
376307    0.046816
375801    0.046816
375794    0.046816
376158    0.046816
375992    0.046816
375863    0.046816
375987    0.046816
            ...   
375960    0.118079
375372    0.118079
375766    0.118079
375768    0.118079
375487    0.118079
375376    0.118079
375374    0.118079
375389    0.118079
375770    0.118079
375762    0.118079
375771    0.118079
375761    0.118079
376112    0.118079
375918    0.118079
375371    0.118079
375760    0.118079
375772    0.118079
375763    0.118079
375295    0.118079
376201    0.118079
375391    0.118079
37