## Check the setup and connect to the database

In [2]:
%run "../check_setup.ipynb"

SAP HANA Client for Python: 2.23.25010300
Connected to SAP HANA db version 4.00.000.00.1736867381 (fa/CE2024.40) 
at c5889dd5-e0f6-4930-8408-94d53ca61dbf.hna0.prod-us10.hanacloud.ondemand.com:443 as CODEJAMHANAML40
Current time on the SAP HANA server: 2025-01-27 19:21:15.143000


In [3]:
myconn.get_tables(schema='NHTSA')

Unnamed: 0,TABLE_NAME
0,COMPLAINTS


# Car complaints data

The __National Highway Traffic Safety Administration (NHTSA)__, is part of the U.S. Department of Transportation.
Complaint information entered into NHTSA’s Office of Defects Investigation __vehicle owner's complaint database__ is used with other data sources to identify __safety issues__ that warrant investigation and to determine if a safety-related defect trend exists. Complaint information is also analyzed to monitor existing recalls for proper scope and adequacy. The NHTSA provides a large dataset of complaints related to cars in the US:[https://www.nhtsa.gov/nhtsa-datasets-and-apis#complaints].
For this demo scenario, we've loaded the 2024 complaints data, for detail instructions see appendix.


In [9]:
# Create a HANA dataframe for the complaints data, pre-loaded into SAP HANA Cloud
hdf_complaints=myconn.table('COMPLAINTS', schema='NHTSA')
# hdf_complaints.get_table_structure()

In [21]:
# Overview the complaints data
import pandas as pd
pd.set_option('max_colwidth', None) 
display(
    hdf_complaints.filter("""PROD_TYPE='V'""") # Vehicle-related complaint
    .select('CMPLID', 'MAKETXT', 'MODELTXT', 'YEARTXT', 'COMPDESC', 'CDESCR')
    .head(1).collect().T
)

Unnamed: 0,0
CMPLID,1639185
MAKETXT,FORD
MODELTXT,ESCAPE
YEARTXT,2020
COMPDESC,ENGINE
CDESCR,"SINCE DAY ONE OF OWNERSHIP, IT HAS A VIBRATION IN THE STEERING WHEN APPROACHING 25MPH THEN THE VIBRATION CONTINUES ALL THE WAY TO 75MPH. ALSO AT APPROX. 25MPH AN LOW HUMMING SOUND STARTS. THE HUMMING SOUND GETS LOADER AS SPEED INCREASES ALONG WITH THE ROAD NOISE. THE HUMMING AND VIBRATION STOP AS SOON AS YOU TAKE YOUR FOOT OFF THE ACCELERATOR. BUT RESUMES AGAIN AS SOON AS THE ACCELERATOR IS PRESSED. THIS VIBRATION FEELS AS IF A WHEEL IS LOOSE OR COMING OFF OR THE THE ENGINE AND TRANSMISSION ARE GOING TO FAIL CAUSING A SERIOUS PROBLEM. THIS BY NO MEANS IS NORMAL. BOTH DEALERSHIPS PERSONNEL THAT DROVE THE VEHICAL SAID ITS NOT NORMAL."


In [28]:
# Let's filter on specific component-groups, for detailed classification analysis
hdf_carcomplaints=(hdf_complaints
    .select('CMPLID', 'MFR_NAME', 'MAKETXT', 'MODELTXT', 'YEARTXT', 'CRASH', 'FIRE', 'STATE', 'CMPL_TYPE',
            'ANTI_BRAKES_YN', 'CRUISE_CONT_YN', 'DRIVE_TRAIN',   'VEHICLES_TOWED_YN', 'CDESCR', 'COMPDESC')
    .filter('''COMPDESC IN ('AIR BAGS','ELECTRICAL SYSTEM', 'SERVICE BRAKES','STEERING')'''))

hdf_carcomplaints.count()

122448

In [31]:
pd.set_option('max_colwidth', None) 
display(hdf_carcomplaints.head(1).collect().T)

Unnamed: 0,0
CMPLID,1766833
MFR_NAME,Ford Motor Company
MAKETXT,FORD
MODELTXT,ESCAPE
YEARTXT,2009
CRASH,N
FIRE,N
STATE,MI
CMPL_TYPE,IVOQ
ANTI_BRAKES_YN,N


## Text splitting, preparing complaints description text for vectorization

Text embedding models and Large Language Models (LLMs) often have token length limits, hence managing the text length before running it through such models is a frequent preprocessing task. For that purpose, a new text splitting function __hana_ml.text.text_splitter__ is being introduced and explained in more detail in the following blog post: [Text chunking - an exciting new NLP function in SAP HANA Cloud](https://community.sap.com/t5/technology-blogs-by-sap/text-chunking-an-exciting-new-nlp-function-in-sap-hana-cloud/ba-p/13958766).

In [38]:
# Determining the character length (using SQL length-fct) of a given text. For western languages, character-length / 3 or 4 is giving an approximate token length
# Note, the text analysis function applied above, also determines the token length specifically

hdf_carcomplaints.select('CMPLID', ('LENGTH("CDESCR")', 'LEN_CDESCR')).sort('LEN_CDESCR', desc=True).head(100).collect()

Unnamed: 0,CMPLID,LEN_CDESCR
0,1918524,2048
1,1660150,2048
2,1833976,2048
3,1840207,2048
4,1917881,2048
...,...,...
95,1993051,2048
96,1995730,2048
97,1745345,2048
98,1768898,2048


In [40]:
hdf_carcomplaints.filter("""CMPLID=1918524""").select("CDESCR").collect().T

Unnamed: 0,0
CDESCR,"My name is [XXX] and I have been a Nissan Consumer for over 10 years, and this is the second time I have had major issues with my breaks. Unfortunately, I did not send an email when the 1st occurrence happened but, in my brand, new Rouge around 2017, my front and back breaks went completely out and had to be replaced at 6 months, but I continued to patronize your company. This occurrence I am sending an email regarding my 2022 Nissan Pathfinder I purchased from Passport Nissan, Marlowe Heights. I called the consumer customer service number on 8/12 and was provided case # [XXX}. Nissan Consumer line says, oh, there is nothing we can do because your vehicle is not under warranty and there has not been a recall! I brought my truck in for an oil change and tire rotation (which is part of the extended warranty when I purchased my truck) on July 11, 2003, and was not informed that my brakes needed any attention. The week of 8/7/2023 I noticed my brakes were really loud when I went in reverse so I looked up when should a car/truck need new brakes seeing as though I have only had this vehicle for just a year in July. It should brake should not be needed for at least 30-60,000 miles, here I am at 12, 483 miles. I called the dealership and was informed this is likely wear and tear and that I would be responsible for the cost of new breaks if needed. I am baffled because the point of me leasing a new vehicle is an attempt to avoid major costs such as brake and engine concerns before my 3-year lease is up. Yes, I leased 3 years prior to this lease a year ago! I make an appointment and take my truck to Nissan of Marlow Heights and informed the worker of my concern and he immediately tells me, oh, it’s just a noise your bakes make early in the morning, my Toyota does the same thing in the mornings. I explained to him that it’s not just in the morning but when I go in reverse, and I demonstrated this concern and he was like, oh no that is not norm INFORMATION REDACTED PURSUANT TO THE FREEDOM OF INFORMATION ACT (FOIA), 5 U."


In [43]:
# Applying the Text Splitter with recursive-splitting, available with hana-ml 2.23
from hana_ml.text.text_splitter import TextSplitter

splitter = TextSplitter(split_type='recursive', chunk_size=512, overlap=64)

splitted_text = splitter.split_text(hdf_carcomplaints.select('CMPLID', 'CDESCR').head(10), order_status=True)
#print(splitted_text.shape)
display(splitter.statistics_.collect())

display(splitted_text.collect())

Unnamed: 0,STAT_NAME,STAT_VALUE
0,GLOBAL_SEPARATOR_LIST,"{""Separator"":""[\n\n,\n, ]""}"


Unnamed: 0,CMPLID,SUB_ID,CONTENT
0,1766833,0,"Subject vehicle was a 2009 Ford Escape with 74,123 miles. Driving the vehicle the front passenger lower control arm separated from the k-frame/cradle causing the vehicle to lose control. Loss of steering was experienced as well as tire lock up from the front tire lodging against the back of the fender. The CV shaft also separated during the failure resulting in loss of transmission engagement. There was no warning prior to the event, the failure was sudden and immediate. The loss of vehicle control was"
1,1766833,1,"was sudden and immediate. The loss of vehicle control was extreme. Failure is identical to RCRIT-14V165-9596 for the 01-04 Escapes. The reason the recall was only isolated to the 01-04 Escapes is unknown. In my opinion, whatever PCA actions or supplier related issues identified by Ford Motor Company during the prior recall investigation to support the recall only affecting 01-04 model years was either incomplete or inaccurate. The frame rust through condition extends beyond the model years listed in the"
2,1766833,2,through condition extends beyond the model years listed in the recall and was not resolved as shown by subsequent failures on vehicles outside of the recall model year window. No mechanism was put in place to prevent the control arm separation in the later model year Escapes to prevent the same safety issue from occurring. I was unable to obtain a front cradle from a local salvage yard that did not exhibit perforation present or starting supporting this condition is occurring on all model year Escapes from
3,1766833,3,this condition is occurring on all model year Escapes from 2001 to 2012 in corrosion prone areas. The recall should be extended to all model years involved to prevent potential accidents.
4,1766844,4,"Subject vehicle is a 2009 Ford Escape with 74,123 miles. Driving the vehicle the lower control arm separated from the front frame otherwise known as the engine cradle or subframe. The failure is identical to the subframe failures on the 2001 - 2004 Ford Escapes currently under recall. The failure was immediate with no prior warning. The vehicle exhibited loss of steering control, wheel lock up from the front tire contacting the back of the wheel opening, and loss of transmission engagement from cv joint"
5,1766844,5,"wheel opening, and loss of transmission engagement from cv joint separation. In my opinion, whatever PCA actions or quality root cause was identified by Ford Motor Company during the prior recall investigation isolating the condition to 2001 - 2004 Escapes was either incomplete or inaccurate. This is supported by the same failure occurring in Escapes outside of that model year window. No mechanism was added after the 2004 Escapes were produced to prevent this failure from occurring. Images attached show"
6,1766844,6,to prevent this failure from occurring. Images attached show the failure that was experienced. I was unable to obtain a good used frame locally that did not exhibit signs of corrosion with perforation present or starting. This further supports the failure mode is beyond the scope of the initial recall. The recall should be expanded to prevent future accidents that could have the potential for injury or death.
7,1871568,7,"The vehicle is only 6 years old and the audio control module stopped working, rendering me without a radio. Southgate Ford dealership in Southgate, MI informed me that this is a known issue. I would like to see accountability and have this audio control module recalled. A search online shows many users experience this issue. Not having a working radio is a safety issue. Having access to emergency radio bulletins while driving is imperative. A known issue should be recalled."
8,1900521,8,"I have been the only driver/owner since May of 2020. Vehicle has not been in any type of accident. 2020 Ford Escape has approximately 18,000 miles. I did purchase (bumper to bumper) warranty. Vehicle is scheduled for service on July 10, 2023. Ford dealerships are understaffed and can only service vehicles on urgency of issue or issues. July date was soonest and was from the 3rd Ford dealership I called. Issues started on May 13, 2023. While vehicle is parked and while it’s being driven, the following"
9,1900521,9,"vehicle is parked and while it’s being driven, the following warnings occur: •AWD Off •AWD Service Required •Powertrain Malfunction •Powertrain Service Required •Pre-collision Assist Not Available •Service Advance Trac •See Manual •Passenger window opens/closes on its own When vehicle is in park: •Power Lift-gate opens halfway or does not open •Doors unlock and lock on their own"


In [51]:
# Applying the Text Splitter with recursive-splitting, available with hana-ml 2.23
from hana_ml.text.text_splitter import TextSplitter

splitter = TextSplitter(split_type='recursive', chunk_size=512, overlap=64, separator='[.]')

splitted_text = splitter.split_text(hdf_carcomplaints.select('CMPLID', 'CDESCR').head(10), order_status=True)
display(splitter.statistics_.collect())
display(splitted_text.collect())

Unnamed: 0,STAT_NAME,STAT_VALUE
0,GLOBAL_SEPARATOR_LIST,"{""Separator"":""[\n\n,\n, ]""}"


Unnamed: 0,CMPLID,SUB_ID,CONTENT
0,1766833,0,"Subject vehicle was a 2009 Ford Escape with 74,123 miles. Driving the vehicle the front passenger lower control arm separated from the k-frame/cradle causing the vehicle to lose control. Loss of steering was experienced as well as tire lock up from the front tire lodging against the back of the fender. The CV shaft also separated during the failure resulting in loss of transmission engagement. There was no warning prior to the event, the failure was sudden and immediate. The loss of vehicle control was"
1,1766833,1,"was sudden and immediate. The loss of vehicle control was extreme. Failure is identical to RCRIT-14V165-9596 for the 01-04 Escapes. The reason the recall was only isolated to the 01-04 Escapes is unknown. In my opinion, whatever PCA actions or supplier related issues identified by Ford Motor Company during the prior recall investigation to support the recall only affecting 01-04 model years was either incomplete or inaccurate. The frame rust through condition extends beyond the model years listed in the"
2,1766833,2,through condition extends beyond the model years listed in the recall and was not resolved as shown by subsequent failures on vehicles outside of the recall model year window. No mechanism was put in place to prevent the control arm separation in the later model year Escapes to prevent the same safety issue from occurring. I was unable to obtain a front cradle from a local salvage yard that did not exhibit perforation present or starting supporting this condition is occurring on all model year Escapes from
3,1766833,3,this condition is occurring on all model year Escapes from 2001 to 2012 in corrosion prone areas. The recall should be extended to all model years involved to prevent potential accidents.
4,1766844,4,"Subject vehicle is a 2009 Ford Escape with 74,123 miles. Driving the vehicle the lower control arm separated from the front frame otherwise known as the engine cradle or subframe. The failure is identical to the subframe failures on the 2001 - 2004 Ford Escapes currently under recall. The failure was immediate with no prior warning. The vehicle exhibited loss of steering control, wheel lock up from the front tire contacting the back of the wheel opening, and loss of transmission engagement from cv joint"
5,1766844,5,"wheel opening, and loss of transmission engagement from cv joint separation. In my opinion, whatever PCA actions or quality root cause was identified by Ford Motor Company during the prior recall investigation isolating the condition to 2001 - 2004 Escapes was either incomplete or inaccurate. This is supported by the same failure occurring in Escapes outside of that model year window. No mechanism was added after the 2004 Escapes were produced to prevent this failure from occurring. Images attached show"
6,1766844,6,to prevent this failure from occurring. Images attached show the failure that was experienced. I was unable to obtain a good used frame locally that did not exhibit signs of corrosion with perforation present or starting. This further supports the failure mode is beyond the scope of the initial recall. The recall should be expanded to prevent future accidents that could have the potential for injury or death.
7,1871568,7,"The vehicle is only 6 years old and the audio control module stopped working, rendering me without a radio. Southgate Ford dealership in Southgate, MI informed me that this is a known issue. I would like to see accountability and have this audio control module recalled. A search online shows many users experience this issue. Not having a working radio is a safety issue. Having access to emergency radio bulletins while driving is imperative. A known issue should be recalled."
8,1900521,8,"I have been the only driver/owner since May of 2020. Vehicle has not been in any type of accident. 2020 Ford Escape has approximately 18,000 miles. I did purchase (bumper to bumper) warranty. Vehicle is scheduled for service on July 10, 2023. Ford dealerships are understaffed and can only service vehicles on urgency of issue or issues. July date was soonest and was from the 3rd Ford dealership I called. Issues started on May 13, 2023. While vehicle is parked and while it’s being driven, the following"
9,1900521,9,"vehicle is parked and while it’s being driven, the following warnings occur: •AWD Off •AWD Service Required •Powertrain Malfunction •Powertrain Service Required •Pre-collision Assist Not Available •Service Advance Trac •See Manual •Passenger window opens/closes on its own When vehicle is in park: •Power Lift-gate opens halfway or does not open •Doors unlock and lock on their own"
