## CINC Technical Requirements
1. CINC HOIDs must start with the association code.						
2. HOIDs must not contain any special characters. They should be alpha-numeric only. 						
						
3. HOIDs may not be more than 16 characters in length. 						
4. HOIDs should represent the shortest possible account number that adheres to these requirements. Less than 12 characters is ideal.						
						
5. No duplicate HOIDs may exist in a data set.						
6. HOIDs must represent the property address using the most valid identifying information. This may include the street number, an abbreviation of the street name (typically initials), unit number, and lot number. 						
						
						
						
7. The provided data sets should be taken as examples to test your script. Your script should produce valid HOIDs from any reasonable data set that includes the five provided elements (association code, street number, street name, unit, and lot). 						
						
						
						

In [3]:

import pandas as pd

# Specify the path to your CSV file
csv_file_path = 'cinc_dataset2.csv'

# Read the CSV file into a DataFrame
data= pd.read_csv(csv_file_path)

# Create a DataFrame
df = pd.DataFrame(data)

# drop Old CINC Column 
df = df.drop('CINC HOID', axis=1)

# Replace NaN values with empty space
df.fillna('', inplace=True)

# New CINC HOID using association code, street number, street name, unit, and lot)
df['CINC HOID'] = df['Association Code'].astype(str) +\
+df['Street Number'].astype(str) +\
+df['Street Name'].apply(lambda x: ''.join([word[0] for word in x.split()])).astype(str) +\
+df['Unit Number'].astype(str) +\
+df['Lot Number'].str.replace('[^a-zA-Z0-9 ]', '', regex=True).astype(str)


# Set the display option to show all rows and columns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)


# Display the DataFrame 
print(df.to_string(index=False))


Association Code  Street Number Street Name  Unit Number Lot Number CINC HOID
              AB            123    Main St.            1             AB123MS1
              AB            123    Main St.            2             AB123MS2
              AB            123    Main St.            3             AB123MS3
              AB            123    Main St.            4             AB123MS4
              AB            123    Main St.            5             AB123MS5
              AB            123    Main St.            6             AB123MS6
              AB            123    Main St.            7             AB123MS7
              AB            123    Main St.            8             AB123MS8
              AB            123    Main St.            9             AB123MS9
              AB            123    Main St.           10            AB123MS10
              AB            123    Main St.           11            AB123MS11
              AB            123    Main St.           12        

# Generate and add an 'ID' column
#df['UniqueID'] = range(1, len(df) + 1)

#### You can Auto generate unique ID using this code. And concatenate to CINC HOID to make the ID Even Better

In [None]:
# Generate and add an 'ID' column
import uuid
df['UniqueGeneratedID'] = range(1, len(df) + 1)

### No duplicate HOIDs may exist in a data set.
### Script to find if there are any duplicates in the column

In [4]:

duplicate_rows= df[df.duplicated(subset=['CINC HOID'])]
duplicate_rows

Unnamed: 0,Association Code,Street Number,Street Name,Unit Number,Lot Number,CINC HOID
