<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Manufacturing Defect Analysis</b>
</header>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Introduction:</b></p>

<p style = 'font-size:16px;font-family:Arial'>Consider a major auto manufacturer as our client. The client has reported a serious business issue with increasing warranty repairs. Battery pack replacements primarily drive the issue for EV(Electric Vehicle). As an EV manufacturer, batteries are one of the most expensive and critical components that go into the product. The client wants you to find the root cause and provide actionable insights.</p>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Data:</b></p>
<p style = 'font-size:16px;font-family:Arial'>The data for this demonstration resides on Vantage as well as AWS S3. In a modern analytic ecosystem, vast amounts of data are collected from manufacturing lines and even sensors on the finished product. That data can be left on inexpensive cloud storage and accessed when investigating a problem. We have low-volume, high-value data referenced frequently inside Vantage that has gone through traditional ETL processes to ensure quality, ease of analysis, and performance.  
</p>
<p style = 'font-size:16px;font-family:Arial'>In this demonstration, we will use the structured data inside Vantage to narrow down a problem, then go out to the cloud storage to define a view that shreds a subset of that data so we can join both data sets in a single query to solve our business issue.  A description of the tables involved in this demo is at the end of this notebook.
</p>



<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Steps in the analysis:</b></p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>Initiate a connection to Vantage</li>    
    <li>Narrow down to root cause</li>
        <ul>
            <li>2.1 Were the defects based on the dealers?</li>
            <li>2.2 Were the defects based on the model of the cars?</li>
            <li>2.3 Were the defects based on the assembly plants?</li>
            <li>2.4 Were the defects based on the battery cells?</li>
            <li>2.5 Were the defects based on the lot numbers?</li>
        </ul>
    <li>Analysis of Test reports from Data Lake</li>
        <ul>
            <li>3.1 Create a foreign table to access the JSON data from Amazon S3</li>
            <li>3.2 Access and join the JSON manufacturing test data natively in Vantage</li>
        </ul>
    <li>Clean-up</li>
</ol>

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>1. Connect to Vantage</b>

In [None]:
import pandas as pd
import getpass
from teradataml import *
import plotly.express as px

pd.set_option('display.max_colwidth', None)

import warnings
warnings.filterwarnings('ignore')

<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=ManufacturingDefects.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:20px;font-family:Arial;color:#E37C4D'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_EVCarBattery_cloud');"        # Takes 15 seconds
%run -i ../run_procedure.py "call get_data('DEMO_EVCarBattery_local');"        # Takes 30 seconds

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>2. Narrow down the root cause</b>
<p style = 'font-size:16px;font-family:Arial'>This section will try to find the source of defective batteries. We'll answer questions such as whether bad batteries are installed in a specific model, whether cars with bad batteries are sold by a particular dealer, etc.</p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.1 Were the defects based on the dealers?</b></p>
<p style = 'font-size:16px;font-family:Arial'>Is any specific dealer the cause for the increased warranties?</p>

In [None]:
warnings.filterwarnings('ignore')

query = '''
SELECT d.company, count(*) as Num
FROM DEMO_EVCarBattery.dealers d, DEMO_EVCarBattery.bad_batteries bb,
DEMO_EVCarBattery.vehicles v
WHERE bb.vin = v.vin
AND v.dealer_id = d.id
GROUP BY d.company; 
'''

res = DataFrame.from_query(query)
res = res.to_pandas()

fig = px.pie(res, values='Num', names='Company', title='Proportion of bad battery warranty claims by dealers')
fig.show()

<p style = 'font-size:16px;font-family:Arial'>Warranty claims for cars with defective batteries are from all dealers equally. So dealers are not the culprits here. The issue is in the earlier stages of the manufacturing pipeline.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.2 Were the defects based on the model of the cars?</b></p>
<p style = 'font-size:16px;font-family:Arial'>Is any specific model of car been installed with defective batteries? We use the same set of battery parts in several different models across our product line.</p>

In [None]:
warnings.filterwarnings('ignore')

query = '''
SELECT v.model, count(*) as Num
FROM DEMO_EVCarBattery.vehicles v, DEMO_EVCarBattery.bad_batteries bb
WHERE bb.vin = v.vin
GROUP BY v.model;
'''

res = DataFrame.from_query(query)
res = res.to_pandas()


fig = px.pie(res, values='Num', names='model', title='Proportion of bad battery warranty claims by dealers vehicle model')
fig.show()

<p style = 'font-size:16px;font-family:Arial'>Warranty claims for cars with defective batteries are from all models almost equally. So car models are not the culprits here. The issue is in the earlier stages of the manufacturing pipeline.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.3 Were the defects based on the assembly plants?</b></p>
<p style = 'font-size:16px;font-family:Arial'>Is any specific manufacturing plant installing the defective batteries in the cars?</p>

In [None]:
warnings.filterwarnings('ignore')

query = '''
SELECT mfg.company, count(*) as Num
FROM DEMO_EVCarBattery.mfg_plants mfg, DEMO_EVCarBattery.bad_batteries bb,
DEMO_EVCarBattery.vehicles v
WHERE bb.vin = v.vin
AND v.mfg_plant_id = mfg.id
GROUP BY mfg.company;
'''

res = DataFrame.from_query(query)
res = res.to_pandas()


fig = px.pie(res, values='Num', names='Company', title='Proportion of bad battery warranty claims by manufacturing plant')
fig.show()

<p style = 'font-size:16px;font-family:Arial'>A whopping 81.4% of warranty claims for defective batteries are from a single manufacturing plant, i.e. Jackson Plant. We have found the culprit here!</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.4 Were the defects based on the battery cell?</b></p>
<p style = 'font-size:16px;font-family:Arial'>Although we have found the manufacturing plant that installed the most defective batteries, it still makes sense to dig deeper and go into finer details. Let's find out what battery cells(type of battery) are installed in cars with bad batteries:</p>

In [None]:
warnings.filterwarnings('ignore')

query = '''
SELECT DISTINCT bom.part_no, p.description, count(*) as Num
FROM DEMO_EVCarBattery.bom, DEMO_EVCarBattery.bad_batteries bb, DEMO_EVCarBattery.parts p
WHERE bb.vin = bom.vin
AND bom.part_no = p.part_no
AND p.description LIKE 'Battery Cell%'
GROUP BY bom.part_no, p.description;
'''

res = DataFrame.from_query(query)
res = res.to_pandas()


fig = px.pie(res, values='Num', names='part_no', title='Proportion of bad battery warranty claims by battery cell')
fig.show()

<p style = 'font-size:16px;font-family:Arial'>Ok, we have an issue with part_no '20rd0'! This part(type of EV battery) has led to a majority of warranty claims for bad batteries.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.5 Were the defects based on the lot numbers?</b></p>
<p style = 'font-size:16px;font-family:Arial'>In the previous step, we found the exact part that was the cause of the bad batteries. But are all the batteries with part_no '20rd0' faulty or is there any correlation with the lot number? In simple terms, were there any specific lot(s) which produced faulty batteries? We store detailed manufacturing data in our integrated data warehouse. Let us use that to perform the analysis.</p>

In [None]:
warnings.filterwarnings('ignore')

query = '''
SELECT bom.part_no, bom.lot_no, p.description, count(*) as Num
FROM DEMO_EVCarBattery.bom bom, DEMO_EVCarBattery.bad_batteries bb, DEMO_EVCarBattery.parts p
WHERE bb.vin = bom.vin
AND p.part_no = bom.part_no
AND p.description LIKE 'Battery Cell%'
GROUP BY bom.part_no, bom.lot_no, p.description
;
'''

res = DataFrame.from_query(query)
res.sort('Num', ascending= False)


<p style = 'font-size:16px;font-family:Arial'>Ok, now we know the underlying issue with part_no '20rd0' - the majority of the failures are from battery lot '4102' (which turns out to have been delivered to the Jackson Plant) has a considerable number of faulty batteries that are driving our warranty replacements.
<br>
<br>
Great! So we found out the root cause for the increased warranty claims for faulty batteries. Could we do more to give some actionable insights?</p>

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>3. Analysis of Test reports from Data Lake</b>
<p style = 'font-size:16px;font-family:Arial'>Taking this analysis even further, we want to understand how we can detect bad batteries before they end up in our customers' cars. This will help us avoid expensive warranty repair cycles and poor customer satisfaction in the future. When the cars are manufactured, we store detailed test reports for the various parts and subsystems that comprise the vehicle. These are voluminous semi-structured data and are loaded directly into our Data Lake, which is housed in an object store(AWS S3).
<br>
<br>
Using <b>Teradata Vantage</b> we can <b>natively</b> pull in this data and use it for our analysis!</p>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>3.1 Create a foreign table to access the JSON data from Amazon S3</b></p>

In [None]:
%%capture
query = '''
CREATE FOREIGN TABLE test_reports
(
    Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
    payload JSON(16776192) INLINE LENGTH 64000 CHARACTER SET LATIN)
USING (
    Location ('/s3/s3.amazonaws.com/trial-datasets/EVCarBattery')
), NO PRIMARY INDEX;
'''

execute_sql(query)

In [None]:
query = '''
SELECT *
FROM test_reports
SAMPLE 2;
'''

res = DataFrame.from_query(query)
res

<p style = 'font-size:16px;font-family:Arial'>The above data dump is not readable and hence let us put a user-friendly view on top of the foreign table to shred the files and make the test report data easier to access:</p>

In [None]:
%%capture
query = '''
REPLACE VIEW test_reports_v AS
(SELECT vin, part_no, lot_no, CAST(test_report AS JSON) test_report
FROM TD_JSONSHRED(
    ON (
                SELECT payload.vin as vin, payload
                FROM test_reports)
            USING
            ROWEXPR('parts')
            COLEXPR('part_no', 'lot_no', 'test_report') 
            RETURNTYPES('VARCHAR(17)', 'VARCHAR(1000)', 'VARCHAR(10000)')
        ) AS d1 (vin, part_no, lot_no, test_report)
    )
'''

execute_sql(query)

In [None]:
query = '''
SELECT *
FROM test_reports_v
SAMPLE 10;
'''

res = DataFrame.from_query(query)
res

<p style = 'font-size:16px;font-family:Arial'>So now we have test reports for parts of each vehicle. Batteries have detailed test reports rather than just pass/fail. Can we use this data to provide any valuable insights?</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>3.2 Access and join the JSON manufacturing test data natively in Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>Various parts will have different data that gets reported when testing. The test report for the battery has detailed data on its performance after assembling but before fitting it into the vehicle.</p>
<p style = 'font-size:16px;font-family:Arial'>We want to compare the rated and measured capacities along with part/lot numbers for just the batteries - we can easily drill into the JSON data using simple dot notation to access the test results we need.</p>

In [None]:
query = '''
SELECT tr.part_no, p.description, tr.lot_no, 
tr.test_report."Rated Capacity" AS rated_capacity,
tr.test_report."Static Capacity Test"."Measured Average Capacity" AS measured_capacity
FROM DEMO_EVCarBattery.parts p, test_reports_v tr
WHERE  p.part_no = tr.part_no
AND p.description LIKE 'Battery Cell%';
'''

res = DataFrame.from_query(query).to_pandas(all_rows = True)
res['measured_capacity'] = round(res['measured_capacity'].astype('float'), 2)
res.head(10)

In [None]:
warnings.filterwarnings('ignore')

fig = px.violin(res, y = "measured_capacity", color = "lot_no")
fig.add_hline(y = 6.9)
fig.add_hline(y = 6.0)
fig.show()

<p style = 'font-size:16px;font-family:Arial'>The above graph depicts the distribution of the measured capacity of batteries grouped by lot numbers. Each line(as it seems) is a violin plot. You may zoom in to have a clear picture. Violin plots show the probability density of the data, and here it is measured capacity data per lot. Hover over each violin plot to see values such as min, max, median, quantiles, etc.
<br>
<br>
One observation is the pink plot which goes below the lower threshold. This plot is for <b>lot_no 4102</b>. Looking closely, we see that the violin plot for lot_no 4102 is inclined towards the bottom. The next graph shows the violin plot for lot_no 4102 explicitly.</p>

In [None]:
warnings.filterwarnings('ignore')

fig = px.violin(res[res.lot_no == '4102'], y="measured_capacity", color = "lot_no", points='all')
fig.add_hline(y = 6.9)
fig.add_hline(y = 6.5)
fig.add_hline(y = 6.0)
fig.show()

<p style = 'font-size:16px;font-family:Arial'>These battery packs(lot_no = 4102) are within the specification, but the range is much lower than the other battery lots. A significant amount of batteries are concentrated in the lower part of the graph, i.e. with lower measured capacity than what is rated for this lot(6.5).</p>

<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Insights:</b></p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>Test reports need to be proactively monitored in order to avoid installing faulty batteries</li>
    <li>Tighten up the acceptance criteria of batteries</li>
</ol>
<p style = 'font-size:16px;font-family:Arial'>These initiatives will increase product quality and make sure this doesn't happen again!</p>

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>4. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>Cleanup work tables to prevent errors next time.</p>

In [None]:
tables = ['test_reports','test_reports_v']

# Loop through the list of tables and execute the drop table command for each table
for table in tables:
    try:
        db_drop_table(table_name=table)
    except:
        pass

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_EVCarBattery');"        # Takes 5 seconds

In [None]:
remove_context()

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Dataset</b>

<b>bom</b> - Bill of materials - contains all major parts that make up each vehicle:

- `id`: unique identifier
- `vin`: vehicle identification number
- `part_no`: part number
- `vendor_id`: vendor the part was produced by (unused)
- `lot_no`: lot number from the vendor
- `quantity`: how many of this part are in the vehicle

<b>dealers</b> - Vehicle sales and distributors:

- `id`: unique identifier
- `Company`: company name
- `StreetAddress`: street address
- `City`: city
- `State`: state
- `ZipCode`: postcode
- `Country`: country
- `EmailAddress`: main email address
- `TelephoneNumber`: telephone number
- `DomainName`: URL for company website
- `Latitude`: latitude (location)
- `Longitude`: longitude (location

<b>mfg_plants</b> - Manufacturing facilities:

- `id`: unique identifier
- `Company`: facility name
- `StreetAddress`: street address
- `City`: city
- `State`: state
- `ZipCode`: postcode
- `Country`: country
- `EmailAddress`: main email address
- `TelephoneNumber`: telephone number
- `DomainName`: URL for plant website
- `Latitude`: latitude (location)
- `Longitude`: longitude (location

<b>parts</b> - Master list of parts for all vehicles:

- `part_no`: unique part number
- `description`: part description


<b>vehicles</b> - Vehicles we have built/are building:

- `vin`: unique identifier
- `yr`: model year
- `model`: vehicle model code
- `customer_id`: customer / purchaser
- `dealer_id`: dealer where vehicle was sold/delivered
- `mfg_plant_id`: plant the vehicle was assembled

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">Copyright © Teradata Corporation - 2023. All Rights Reserved.</footer>