# Data Quality Starter

### Case 1: Detect company names that are fake data or spam

We need to evaluate the quality of leads who have expressed interest in Snowflake.
The data was collected using a form that is available to anyone, some of the data could contain fake data or spam. 
We can leverage our LLM Cortex function to help us detect and filter out spam from our data set.

In [None]:
ALTER SESSION SET QUERY_TAG = '{
    "origin": "sf_sit-is", 
    "name": "marketing_data_foundation_starter_spcs", 
    "version": {"major": 3, "minor": 0},
    "attributes": {"is_quickstart": 0, "source": "notebook"}
}';

select first_name, last_name, title, company,
 snowflake.cortex.complete('mistral-7b'
        , [
        {'role': 'system', 'content': 'You are a marketing expert working at Snowflake Inc. Your job is to evaluate the quality of leads who have expressed interest in Snowflake on a form available to anyone who visits the Snowflake website. '
            || 'Please classify if the data entered is one of these two categories: legitimate or spam. Consider each field on its own and in combination with the other fields. '
            || 'Here are some attributes of high quality leads: The job title should be one that would use or buy cloud software. The company name should appear to be a real organization. '
            || 'Here are some attributes of spam leads: The data contains references to famous people who would not be buying Snowflake. Any of the fields contain gibberish text or offensive content. Most fields contain just a single character or invalid punctuation. '
            || 'Wrap the category classification with an xml tag <response>.'
            || 'You should provide only "legitimate" or "spam" and no text other than the xml tag' },
        {'role': 'user', 'content': '<data>'
            || 'Name: '
            || first_name || ' ' || last_name
            || ', job title: '
            || title
            || ', company: '
            || company
            || '</data>'}
            ]
        , {'max_tokens': 10}
        ):choices[0]:messages::string as spam_filter_raw
    , regexp_substr(spam_filter_raw, '<response>(.*?)<\/response>', 1, 1, 'e', 1) as spam_filter
from LLM_DEMO.DEMO.customer_information;