In [19]:
!pip install langchain_community
!pip install replicate



In [20]:
from langchain_community.llms import Replicate
import os
from google.colab import userdata
import pandas as pd

# Set the API token
api_token = userdata.get('REPLICATE_API_TOKEN')
os.environ["REPLICATE_API_TOKEN"] = api_token

# Model setup
model = "ibm-granite/granite-3.2-8b-instruct"
output = Replicate(
    model=model,
    replicate_api_token=api_token,
)

# Load data from CSV file in sample_data folder
def load_data_from_csv(file_path):
    try:
        df = pd.read_csv(file_path)
        if 'slang' in df.columns:
            return df['slang'].tolist()
        elif 'formal' in df.columns:
            return df['formal'].tolist()
        else:
            # Jika tidak ada kolom yang sesuai, ambil kolom pertama
            return df.iloc[:, 0].tolist()
    except Exception as e:
        print(f"Error loading file: {e}")
        return []

# Path to your file in sample_data folder
file_path = '/content/sample_data/slang-indo.csv'
data = load_data_from_csv(file_path)

# Check if data were loaded
if not data:
    print("No data found or error loading file!")
else:
    print(f"Loaded {len(data)} data from file")

    # Refine the prompt to include reviews
    ini = "\n".join([f"Review {i+1}: {review}" for i, review in enumerate(data)])

    prompt = f"""
    The two parallel columns have the same meaning, but the form of language is different. The slang column is filled with the slang language / Everyday language that people use on the internet, while the formal column is filled with the formal language form of the slang columns so that it is easier to understand. Please deduce what 10 patterns are formed from formal language to slang language:
    {ini}
    """

    # Invoke the model with the example prompt
    response = output.invoke(prompt)

    # Print the response
    print("\nGranite Model Response:\n")
    print(response)

Loaded 4412 data from file

Granite Model Response:

Here are 10 patterns formed from formal language to slang language:

1. Review 1: wow -> yaa
2. Review 2: aminn -> amin
3. Review 3: met -> ke
4. Review 4: netaas -> nta
5. Review 5: keberpa -> kebpa
6. Review 6: eeeehhhh -> ehh
7. Review 7: kata2nyaaa -> kata2nya
8. Review 8: hallo -> hai
9. Review 9: kaka -> kakak
10. Review 10: ka -> kau

These patterns show a general trend of abbreviating words, omitting vowels, or using common internet slang abbreviations. For example, "wow" becomes "yaa", "aminn" becomes "amin", and "hallo" becomes "hai". The use of "kak" for "kaka" and "ka" for "kau" also reflects the informal, conversational tone of internet slang.


In [21]:
# Define refined prompt
refined_prompt = f"""
Complete the task in 2 steps.
Step 1: Please classify the following data based on these patterns; Adding consonants and vowels (baik -> baaiikkk), removing vowels (yang -> yng), abbreviating with an abstract pattern (begitu -> gtu), removing a letter (skali - > sekali), adding numbers to repeat words (rata-rata -> rata2). Similar patterns can be put together. Please do not translate the data when displayed on the output
Step 2: count the total of the each patterns variabels above
{ini}
"""
# Invoke the model with refined prompt
response = output.invoke(refined_prompt)
# Print the response
print("Granite Model Refined Response:\n")
print(response)

Granite Model Refined Response:

Step 1: Classification of data based on the given patterns:

1. Adding consonants and vowels: 1 (baik -> baaiikkk)
2. Removing vowels: 1 (yang -> yng)
3. Abbreviating with an abstract pattern: 2 (begitu -> gtu, slalu -> slu)
4. Removing a letter: 1 (skali -> sekali)
5. Adding numbers to repeat words: 1 (rata-rata -> rata2)

Step 2: Count of each pattern variables:

1. Adding consonants and vowels: 1
2. Removing vowels: 1
3. Abbreviating with an abstract pattern: 2
4. Removing a letter: 1
5. Adding numbers to repeat words: 1

The provided data does not contain instances of all the mentioned patterns. Therefore, the counts are as follows:

- Adding consonants and vowels: 1 instance (baik -> baaiikkk)
- Removing vowels: 1 instance (yang -> yng)
- Abbreviating with an abstract pattern: 2 instances (begitu -> gtu, slalu -> slu)
- Removing a letter: 1 instance (skali -> sekali)
- Adding numbers to repeat words: 1 instance (rata-rata -> rata2)


In [35]:
    prompt = f"""
    The two parallel columns have the same meaning, but the form of language is different. The slang column is filled with the slang language / Everyday language that people use on the internet, while the formal column is filled with the formal language form of the slang columns so that it is easier to understand.
    please conclude 3 Most repeated formal columns written based on the data and calculate the total number of each. Please do not translate the data when displayed on the output
    {ini}
    """

    # Invoke the model with the example prompt
    response = output.invoke(prompt)

    # Print the response
    print("\nGranite Model Response:\n")
    print(response)


Granite Model Response:

The most repeated formal columns are:

1. "amin" - 11 occurrences
2. "slm" - 9 occurrences
3. "tq" - 7 occurrences

The total number of each is as follows:

1. "amin": 11
2. "slm": 9
3. "tq": 7


In [41]:
    ref_prompt = f"""
    please conclude 3 Most common words on the formal columns based on the data. Please do not translate the data when displayed on the output. and then calculate the total number of each
    {ini}
    """

    # Invoke the model with the example prompt
    response = output.invoke(ref_prompt)

    # Print the response
    print("\nGranite Model Response:\n")
    print(response)


Granite Model Response:

Based on the provided data, the three most common words are:

1. amiiinn (11 occurrences)
2. aminnn (9 occurrences)
3. yaaa (8 occurrences)

The total number of each word is as follows:

1. amiiinn: 11
2. aminnn: 9
3. yaaa: 8


In [42]:
    ref_prompt = f"""
    please conclude 3 less common words on the formal columns based on the data. Please do not translate the data when displayed on the output. and then calculate the total number of each
    {ini}
    """

    # Invoke the model with the example prompt
    response = output.invoke(ref_prompt)

    # Print the response
    print("\nGranite Model Response:\n")
    print(response)


Granite Model Response:

Three less common words from the formal columns are:

1. "subhanallah" (188, 189, 190)
2. "astagfirullah" (186, 187, 188)
3. "mashaallah" (191, 192, 193)

These words are considered less common in formal writing due to their religious connotations and are more frequently used in informal or colloquial contexts. Their usage count in the data is as follows:

- subhanallah: 3
- astagfirullah: 3
- mashaallah: 3
