<a href="https://colab.research.google.com/github/Farbod-gsm99/LLM-Traffic-Volume-Prediction/blob/main/Phi_3_medium_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

# Install Flash Attention 2 for softcapping support
import torch
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install --no-deps packaging ninja einops "flash-attn>=2.6.3"

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-3-medium-4k-instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.2: Fast Mistral patching. Transformers:4.46.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/165k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.72G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.25k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/458 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.12.2 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["system"]
    inputs       = examples["user"]
    outputs      = examples["assistant"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("TheFuriousGunner/CATraffic_Finetuning", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

README.md:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

finetuning_conversations.parquet:   0%|          | 0.00/15.7M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/119100 [00:00<?, ? examples/s]

Map:   0%|          | 0/119100 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 200,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/119100 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
7.553 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 119,100 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 200
 "-____-"     Number of trainable parameters = 65,536,000


Step,Training Loss
1,1.9259
2,1.9276
3,1.9122
4,1.8655
5,1.827
6,1.7696
7,1.6948
8,1.6028
9,1.579
10,1.4531


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

7791.5822 seconds used for training.
129.86 minutes used for training.
Peak reserved memory = 9.455 GB.
Peak reserved memory for training = 1.902 GB.
Peak reserved memory % of max memory = 64.11 %.
Peak reserved memory for training % of max memory = 12.897 %.


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("Phi-3-medium-4k-LoRA") # Local saving
tokenizer.save_pretrained("Phi-3-medium-4k-LoRA")
model.push_to_hub("TheFuriousGunner/Phi-3-medium-4k-LoRA", token = "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") # Online saving
tokenizer.push_to_hub("TheFuriousGunner/Phi-3-medium-4k-LoRA", token = "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") # Online saving

README.md:   0%|          | 0.00/615 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/262M [00:00<?, ?B/s]

Saved model to https://huggingface.co/TheFuriousGunner/Phi-3-medium-4k-LoRA


  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

## Few-shot Learning

In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 8192 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

if True:

    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "TheFuriousGunner/Phi-3-medium-4k-LoRA", # YOUR MODEL YOU USED FOR TRAINING
        token = "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",
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
FastLanguageModel.for_inference(model) # Enable native 2x faster inference


prompt = """"
Instructions: You are an expert traffic volume prediction model, that can predict the future volume values according to spatial temporal information. We want you to perform the traffic volume prediction task, considering the nearby environment and historical traffic volume data.\n Context knowledge you could consider:\n     - Traffic volume: the number of vehicles passing a specific region in an hour, usually ranging from 0 to 1000.\n     - Traffic pattern characteristic: Traffic flow patterns in a city are influenced by various area attributes. Also, traffic volume has a periodic daily and weekly pattern.\n     - Spatial temporal factors correlation: Traffic flow in an area will be affected by its nearby infrastructures, during specific periods for different areas. You should think about how the volume will change in a specific area, during a specific time.\n       For examples,\n             Airports, and train stations - increased volume on weekends and holidays.\n             Residential areas - more activities during morning and evening rush hours.\n             Commercial areas - busy during lunch hours and after-work periods.\n             Educational locations - high volume during peak hours near schools.\n Think carefully about the following questions about how spatial-temporal factors affect traffic flow.\n     - What is the attribute of this area and what is the predicted time zone located in special periods (like rush hours, weekdays, weekends, and holidays)?\n     - What are the traffic patterns of this area, and what is the change in different time slots?\n     - What is the historical temporal trend according to temporal information, considering the weekdays,  around holidays?
Examples:
[INST]
Input: Some important information is listed as follows:\n     - Location: District 4 in Alameda, California, USA, along the I580-W freeway, lane 4, direction of westbound. \n     - Today's weather: Sunny. Temperature is 10.1\\u00b0C, and visibility reaches 7.7 miles. \n     - Region information: including transportation areas and commercial areas within a range of 5 km.\n     - Current Time: 5 PM, 2018-1-14, Sunday, not a holiday.\n     - Traffic volume data in the past 12 hours were 61, 100, 142, 204, 278, 321, 363, 400, 409, 397, 373 and 349, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 6 PM to 5 AM) and explain it. Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12], Explanation: xxx.}. Please think step by step.
[/INST]
Response: {Traffic volume data in the next 12 hours: [342, 285, 253, 216, 160, 99, 56, 34, 28, 24, 38, 106],
"Explanation":   I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on
the given information.
1. Time and Date: It is 5 PM on a Sunday, which is the weekend, with sunny weather and
good visibility.
2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [61, 100, 142, 204, 278, 321, 363, 400, 409, 397, 373, 349]
3. Current Situation: Given the weekend, the traffic flow pattern must differ from regular weekdays.
4. Weekend Factor:  With no known holidays or events, historical data suggest that traffic volume will resemble previous Sundays.
5. Weather Impact: The sunny weather attracts more travelers than usual.
6. Traffic Trend Prediction:
- The evening hours (6 PM to 9 PM): Traffic volume is likely to decrease after the afternoon rush hour.
- Late-night hours (10 PM to 5 AM): Traffic volume may remain low throughout the night.}"}
[INST]
Input: Some important information is listed as follows:\n     - Location: District 7 in Los Angeles, California, USA, along the US101-N freeway, lane 4, direction of northbound. \n     - Today's weather: Sunny. Temperature is 16.0\\u00b0C, and visibility reaches 9.9 miles. \n     - Region information: including transportation areas, residential areas and commercial areas within a range of 5 km.\n     - Current Time: 4 AM, 2018-1-16, Tuesday, not a holiday.\n     - Traffic volume data in the past 12 hours were 460, 394, 273, 223, 185, 135, 92, 49, 34, 24, 28 and 73, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 5 AM to 4 PM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]}
[/INST]
Response: {Traffic volume data in the next 12 hours: [192, 354, 523, 518, 391, 352, 361, 363, 392, 429, 461, 508],
"Explanation": I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on
the given information.
1. Time and Date: It is 4 AM on a tuesday, which is a regular weekday, with sunny weather and good visibility.
2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [460, 394, 273, 223, 185, 135, 92, 49, 34, 24, 28, 73].
3. Current Situation: Given the weekday, and the sunny weather, the traffic pattern resembles the regular weekdays.
4. Weekday Factor: The traffic pattern and traffic volume peaks of a regular weekday must be seen in the prediction.
5. Weather Impact: The sunny weather attracts more day traveleres than rainy days.
6. Traffic Trend Prediction:
- Early morning hours (5 AM to 8 AM): Traffic volume starts to increase as people in the commercial and residual areas tend to get to work and school.
- Morning to noon (9 AM to 1 PM): Traffic volume maintains a lower amount than the early morning but remains relatively high as people in the transportation and commercial area tend to move in the area due to work or lunch.
- Afternoon hours (2 PM to 4 PM): Traffic volume increases as people tend to get back from school or work. }}
[INST]
Input: Some important information is listed as follows:\n     - Location: District 10 in San Joaquin, California, USA, along the I5-N freeway, lane 3, direction of northbound. \n     - Today's weather: Fog. Temperature is 10.8\\u00b0C, and visibility reaches 1.5 miles. \n     - Region information: including transportation areas within a range of 5 km.\n     - Current Time: 4 AM, 2018-1-16, Tuesday, not a holiday.\n     - Traffic volume data in the past 12 hours were 131, 112, 93, 86, 72, 59, 53, 42, 39, 38, 44 and 55, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 5 AM to 4 PM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]}
[/INST]
Response: {Traffic volume data in the next 12 hours: [77, 86, 104, 101, 95, 100, 100, 102, 111, 125, 147, 138],
"Explanation": I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on
the given information.
1. Time and Date: It is 4 AM on a tuesday, which is a regular weekday, with foggy weather and low visibility.
2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [131, 112, 93, 86, 72, 59, 53, 42, 39, 38, 44, 55].
3. Current Situation: Given the weekday, and the foggy weather, the average traffic flow is going to be much lower than regular weekdays.
4. Weekday Factor: The traffic pattern and traffic volume peaks of a regular weekday must somewhat be seen in the prediction.
5. Weather Impact: The foggy weather attracts much less day traveleres than sunny days.
6. Traffic Trend Prediction:
- Early morning hours (5 AM to 8 AM): Traffic volume starts to increase a little as people tend to use the transportation areas to get to work or school but is still much lower than regular due to foggy weather.
- Morning to noon (9 AM to 1 PM): Traffic volume maintains a consistent low amount as people tend to avoid unnecessary trips in foggy weather.
- Afternoon hours (2 PM to 4 PM): Traffic volume increases a little as people tend to get back from school or work. }}
[INST]
Input: Some important information is listed as follows:\n     - Location: District 3 in Yolo, California, USA, along the US50-E freeway, lane 4, direction of eastbound. \n     - Today's weather: Sunny. Temperature is 9.5\\u00b0C, and visibility reaches 7.4 miles. \n     - Region information: including transportation areas, commercial areas and educational areas within a range of 5 km.\n     - Current Time: 0 AM, 2018-1-1, Monday, New Year's Day.\n     - Traffic volume data in the past 12 hours were 196, 208, 204, 174, 215, 176, 155, 126, 97, 70, 58 and 38, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 1 AM to 12 PM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]}
[/INST]
Response: {Traffic volume data in the next 12 hours:  [53, 50, 40, 34, 28, 42, 46, 58, 90, 125, 170, 198],
"Explanation": I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on the given information.
1. Time and Date: It is 12 AM on a Monday, which is a holiday (New Year's Eve), with sunny weather and good visibility.
2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [196, 208, 204, 174, 215, 176, 155, 126, 97, 70, 58, 38].
3. Current Situation: Given the holiday, and the sunny weather, the average traffic flow is going to be much lower than regular weekdays.
4. Holiday Factor: The traffic pattern deviates from regular weekdays and the traffic volumes are much lower as people stay home for in-door celebrations.
5. Weather Impact: The sunny weather attracts more day traveleres than rainy days.
6. Traffic Trend Prediction:
- Late-night hours (1 AM to 5 AM): Traffic volume are low as the night goes on.
- Early morning (6 AM to 9 AM): Traffic volume maintains a much lower amount than normal weekdays due to new year's holiday.
- Late morning (9 AM to 12 PM): Traffic volume increases as people tend to use transportation areas to travel on holidays. }}
[INST]
Input: Some important information is listed as follows:\n     - Location: District 7 in Los Angeles, California, USA, along the I405-N freeway, lane 4, direction of northbound. \n     - Today's weather: Sunny. Temperature is 15.0\\u00b0C, and visibility reaches 10.0 miles. \n     - Region information: including residential areas, transportation areas and commercial areas within a range of 5 km.\n     - Current Time: 2 PM, 2018-1-27, Saturday, not a holiday.\n     - Traffic volume data in the past 12 hours were 47, 57, 109, 180, 224, 320, 309, 368, 354, 362, 374 and 362, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 3 PM to 2 AM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]}
[/INST]
Response:
"""



# alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request."""
# inputs = tokenizer(
# [
#     alpaca_prompt.format(
#         """[INST]Below are a few examples of traffic volume prediction tasks. Please follow these examples and provide a prediction for the given scenario.

# Example 1:
# ### Instruction:
# {"Role: You are an expert traffic volume prediction model, that can predict the future volume values according to spatial temporal information. We want you to perform the traffic volume prediction task, considering the nearby environment and historical traffic volume data.\n Context knowledge you could consider:\n     - Traffic volume: the number of vehicles passing a specific region in an hour, usually ranging from 0 to 1000.\n     - Traffic pattern characteristic: Traffic flow patterns in a city are influenced by various area attributes. Also, traffic volume has a periodic daily and weekly pattern.\n     - Spatial temporal factors correlation: Traffic flow in an area will be affected by its nearby infrastructures, during specific periods for different areas. You should think about how the volume will change in a specific area, during a specific time.\n       For examples,\n             Airports, and train stations - increased volume on weekends and holidays.\n             Residential areas - more activities during morning and evening rush hours.\n             Commercial areas - busy during lunch hours and after-work periods.\n             Educational locations - high volume during peak hours near schools.\n Think carefully about the following questions about how spatial-temporal factors affect traffic flow.\n     - What is the attribute of this area and what is the predicted time zone located in special periods (like rush hours, weekdays, weekends, and holidays)?\n     - What are the traffic patterns of this area, and what is the change in different time slots?\n     - What is the historical temporal trend according to temporal information, considering the weekdays,  around holidays?"}
# ### Input:
# {"Some important information is listed as follows:\n     - Location: District 4 in Alameda, California, USA, along the I580-W freeway, lane 4, direction of westbound. \n     - Today's weather: Sunny. Temperature is 10.1\\u00b0C, and visibility reaches 7.7 miles. \n     - Region information: including transportation areas and commercial areas within a range of 5 km.\n     - Current Time: 5 PM, 2018-1-14, Sunday, not a holiday.\n     - Traffic volume data in the past 12 hours were 61, 100, 142, 204, 278, 321, 363, 400, 409, 397, 373 and 349, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 6 PM to 5 AM) and explain it. Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12], Explanation: xxx.}. Please think step by step."}
# ### Response:
# {"{Traffic volume data in the next 12 hours: [342, 285, 253, 216, 160, 99, 56, 34, 28, 24, 38, 106],
# "Explanation":   I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on
# the given information.
# 1. Time and Date: It is 5 PM on a Sunday, which is the weekend, with sunny weather and
# good visibility.
# 2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [61, 100, 142, 204, 278, 321, 363, 400, 409, 397, 373, 349]
# 3. Current Situation: Given the weekend, the traffic flow pattern must differ from regular weekdays.
# 4. Weekend Factor:  With no known holidays or events, historical data suggest that traffic volume will resemble previous Sundays.
# 5. Weather Impact: The sunny weather attracts more travelers than usual.
# 6. Traffic Trend Prediction:
# - The evening hours (6 PM to 9 PM): Traffic volume is likely to decrease after the afternoon rush hour.
# - Late-night hours (10 PM to 5 AM): Traffic volume may remain low throughout the night.}"}"
# Example 2:
# ### Instruction:
# {"Role: You are an expert traffic volume prediction model, that can predict the future volume values according to spatial temporal information. We want you to perform the traffic volume prediction task, considering the nearby environment and historical traffic volume data.\n Context knowledge you could consider:\n     - Traffic volume: the number of vehicles passing a specific region in an hour, usually ranging from 0 to 1000.\n     - Traffic pattern characteristic: Traffic flow patterns in a city are influenced by various area attributes. Also, traffic volume has a periodic daily and weekly pattern.\n     - Spatial temporal factors correlation: Traffic flow in an area will be affected by its nearby infrastructures, during specific periods for different areas. You should think about how the volume will change in a specific area, during a specific time.\n       For examples,\n             Airports, and train stations - increased volume on weekends and holidays.\n             Residential areas - more activities during morning and evening rush hours.\n             Commercial areas - busy during lunch hours and after-work periods.\n             Educational locations - high volume during peak hours near schools.\n Think carefully about the following questions about how spatial-temporal factors affect traffic flow.\n     - What is the attribute of this area and what is the predicted time zone located in special periods (like rush hours, weekdays, weekends, and holidays)?\n     - What are the traffic patterns of this area, and what is the change in different time slots?\n     - What is the historical temporal trend according to temporal information, considering the weekdays,  around holidays?"}
# ### Input:
# {"Some important information is listed as follows:\n     - Location: District 7 in Los Angeles, California, USA, along the US101-N freeway, lane 4, direction of northbound. \n     - Today's weather: Sunny. Temperature is 16.0\\u00b0C, and visibility reaches 9.9 miles. \n     - Region information: including transportation areas, residential areas and commercial areas within a range of 5 km.\n     - Current Time: 4 AM, 2018-1-16, Tuesday, not a holiday.\n     - Traffic volume data in the past 12 hours were 460, 394, 273, 223, 185, 135, 92, 49, 34, 24, 28 and 73, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 5 AM to 4 PM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]"}
# ### Response:
# {"{Traffic volume data in the next 12 hours: [192, 354, 523, 518, 391, 352, 361, 363, 392, 429, 461, 508],
# "Explanation": I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on
# the given information.
# 1. Time and Date: It is 4 AM on a tuesday, which is a regular weekday, with sunny weather and good visibility.
# 2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [460, 394, 273, 223, 185, 135, 92, 49, 34, 24, 28, 73].
# 3. Current Situation: Given the weekday, and the sunny weather, the traffic pattern resembles the regular weekdays.
# 4. Weekday Factor: The traffic pattern and traffic volume peaks of a regular weekday must be seen in the prediction.
# 5. Weather Impact: The sunny weather attracts more day traveleres than rainy days.
# 6. Traffic Trend Prediction:
# - Early morning hours (5 AM to 8 AM): Traffic volume starts to increase as people in the commercial and residual areas tend to get to work and school.
# - Morning to noon (9 AM to 1 PM): Traffic volume maintains a lower amount than the early morning but remains relatively high as people in the transportation and commercial area tend to move in the area due to work or lunch.
# - Afternoon hours (2 PM to 4 PM): Traffic volume increases as people tend to get back from school or work. }"}
# Example 3:
# ### Instruction:
# {"Role: You are an expert traffic volume prediction model, that can predict the future volume values according to spatial temporal information. We want you to perform the traffic volume prediction task, considering the nearby environment and historical traffic volume data.\n Context knowledge you could consider:\n     - Traffic volume: the number of vehicles passing a specific region in an hour, usually ranging from 0 to 1000.\n     - Traffic pattern characteristic: Traffic flow patterns in a city are influenced by various area attributes. Also, traffic volume has a periodic daily and weekly pattern.\n     - Spatial temporal factors correlation: Traffic flow in an area will be affected by its nearby infrastructures, during specific periods for different areas. You should think about how the volume will change in a specific area, during a specific time.\n       For examples,\n             Airports, and train stations - increased volume on weekends and holidays.\n             Residential areas - more activities during morning and evening rush hours.\n             Commercial areas - busy during lunch hours and after-work periods.\n             Educational locations - high volume during peak hours near schools.\n Think carefully about the following questions about how spatial-temporal factors affect traffic flow.\n     - What is the attribute of this area and what is the predicted time zone located in special periods (like rush hours, weekdays, weekends, and holidays)?\n     - What are the traffic patterns of this area, and what is the change in different time slots?\n     - What is the historical temporal trend according to temporal information, considering the weekdays,  around holidays?"}
# ### Input:
# {"Some important information is listed as follows:\n     - Location: District 10 in San Joaquin, California, USA, along the I5-N freeway, lane 3, direction of northbound. \n     - Today's weather: Fog. Temperature is 10.8\\u00b0C, and visibility reaches 1.5 miles. \n     - Region information: including transportation areas within a range of 5 km.\n     - Current Time: 4 AM, 2018-1-16, Tuesday, not a holiday.\n     - Traffic volume data in the past 12 hours were 131, 112, 93, 86, 72, 59, 53, 42, 39, 38, 44 and 55, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 5 AM to 4 PM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]"}
# ### Response:
# {"{Traffic volume data in the next 12 hours: [77, 86, 104, 101, 95, 100, 100, 102, 111, 125, 147, 138],
# "Explanation": I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on
# the given information.
# 1. Time and Date: It is 4 AM on a tuesday, which is a regular weekday, with foggy weather and low visibility.
# 2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [131, 112, 93, 86, 72, 59, 53, 42, 39, 38, 44, 55].
# 3. Current Situation: Given the weekday, and the foggy weather, the average traffic flow is going to be much lower than regular weekdays.
# 4. Weekday Factor: The traffic pattern and traffic volume peaks of a regular weekday must somewhat be seen in the prediction.
# 5. Weather Impact: The foggy weather attracts much less day traveleres than sunny days.
# 6. Traffic Trend Prediction:
# - Early morning hours (5 AM to 8 AM): Traffic volume starts to increase a little as people tend to use the transportation areas to get to work or school but is still much lower than regular due to foggy weather.
# - Morning to noon (9 AM to 1 PM): Traffic volume maintains a consistent low amount as people tend to avoid unnecessary trips in foggy weather.
# - Afternoon hours (2 PM to 4 PM): Traffic volume increases a little as people tend to get back from school or work. }"}
# Example 4:
# ### Instruction:
# {"Role: You are an expert traffic volume prediction model, that can predict the future volume values according to spatial temporal information. We want you to perform the traffic volume prediction task, considering the nearby environment and historical traffic volume data.\n Context knowledge you could consider:\n     - Traffic volume: the number of vehicles passing a specific region in an hour, usually ranging from 0 to 1000.\n     - Traffic pattern characteristic: Traffic flow patterns in a city are influenced by various area attributes. Also, traffic volume has a periodic daily and weekly pattern.\n     - Spatial temporal factors correlation: Traffic flow in an area will be affected by its nearby infrastructures, during specific periods for different areas. You should think about how the volume will change in a specific area, during a specific time.\n       For examples,\n             Airports, and train stations - increased volume on weekends and holidays.\n             Residential areas - more activities during morning and evening rush hours.\n             Commercial areas - busy during lunch hours and after-work periods.\n             Educational locations - high volume during peak hours near schools.\n Think carefully about the following questions about how spatial-temporal factors affect traffic flow.\n     - What is the attribute of this area and what is the predicted time zone located in special periods (like rush hours, weekdays, weekends, and holidays)?\n     - What are the traffic patterns of this area, and what is the change in different time slots?\n     - What is the historical temporal trend according to temporal information, considering the weekdays,  around holidays?"}
# ### Input:
# {"Some important information is listed as follows:\n     - Location: District 3 in Yolo, California, USA, along the US50-E freeway, lane 4, direction of eastbound. \n     - Today's weather: Sunny. Temperature is 9.5\\u00b0C, and visibility reaches 7.4 miles. \n     - Region information: including transportation areas, commercial areas and educational areas within a range of 5 km.\n     - Current Time: 0 AM, 2018-1-1, Monday, New Year's Day.\n     - Traffic volume data in the past 12 hours were 196, 208, 204, 174, 215, 176, 155, 126, 97, 70, 58 and 38, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 1 AM to 12 PM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]"}
# ### Response:
# {"{Traffic volume data in the next 12 hours:  [53, 50, 40, 34, 28, 42, 46, 58, 90, 125, 170, 198],
# "Explanation": I will provide a step-by-step explanation for predicting traffic volumes in the next 12 hours based on the given information.
# 1. Time and Date: It is 12 AM on a Monday, which is a holiday (New Year's Eve), with sunny weather and good visibility.
# 2. Historical Traffic Volume: In the past 12 hours, the traffic volumes were: [196, 208, 204, 174, 215, 176, 155, 126, 97, 70, 58, 38].
# 3. Current Situation: Given the holiday, and the sunny weather, the average traffic flow is going to be much lower than regular weekdays.
# 4. Holiday Factor: The traffic pattern deviates from regular weekdays and the traffic volumes are much lower as people stay home for in-door celebrations.
# 5. Weather Impact: The sunny weather attracts more day traveleres than rainy days.
# 6. Traffic Trend Prediction:
# - Late-night hours (1 AM to 5 AM): Traffic volume are low as the night goes on.
# - Early morning (6 AM to 9 AM): Traffic volume maintains a much lower amount than normal weekdays due to new year's holiday.
# - Late morning (9 AM to 12 PM): Traffic volume increases as people tend to use transportation areas to travel on holidays. }"}
# Answer the following example like the other ones provided:
# ### Instruction:
# {"Role: You are an expert traffic volume prediction model, that can predict the future volume values according to spatial temporal information. We want you to perform the traffic volume prediction task, considering the nearby environment and historical traffic volume data.\n Context knowledge you could consider:\n     - Traffic volume: the number of vehicles passing a specific region in an hour, usually ranging from 0 to 1000.\n     - Traffic pattern characteristic: Traffic flow patterns in a city are influenced by various area attributes. Also, traffic volume has a periodic daily and weekly pattern.\n     - Spatial temporal factors correlation: Traffic flow in an area will be affected by its nearby infrastructures, during specific periods for different areas. You should think about how the volume will change in a specific area, during a specific time.\n       For examples,\n             Airports, and train stations - increased volume on weekends and holidays.\n             Residential areas - more activities during morning and evening rush hours.\n             Commercial areas - busy during lunch hours and after-work periods.\n             Educational locations - high volume during peak hours near schools.\n Think carefully about the following questions about how spatial-temporal factors affect traffic flow.\n     - What is the attribute of this area and what is the predicted time zone located in special periods (like rush hours, weekdays, weekends, and holidays)?\n     - What are the traffic patterns of this area, and what is the change in different time slots?\n     - What is the historical temporal trend according to temporal information, considering the weekdays,  around holidays?"}
# ### Input:
# {"Some important information is listed as follows:\n     - Location: District 7 in Los Angeles, California, USA, along the I405-N freeway, lane 4, direction of northbound. \n     - Today's weather: Sunny. Temperature is 15.0\\u00b0C, and visibility reaches 10.0 miles. \n     - Region information: including residential areas, transportation areas and commercial areas within a range of 5 km.\n     - Current Time: 2 PM, 2018-1-27, Saturday, not a holiday.\n     - Traffic volume data in the past 12 hours were 47, 57, 109, 180, 224, 320, 309, 368, 354, 362, 374 and 362, respectively.\n According to the above information and careful reasoning, please predict traffic volumes in the next 12 hours (from 3 PM to 2 AM). Format the final answer in a single line as a JSON dictionary like: {Traffic volume data in the next 12 hours: [V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12]"}
# [/INST]
# ### Response:
# {}"""
#     )
# ], return_tensors = "pt").to("cuda")

inputs = tokenizer(
[
    prompt
], return_tensors = "pt").to("cuda")


outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
tokenizer.batch_decode(outputs)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.4: Fast Mistral patching. Transformers:4.46.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth: unsloth/phi-3-medium-4k-instruct-bnb-4bit can only handle sequence lengths of at most 4096.
But with kaiokendev's RoPE scaling of 2.0, it can be magically be extended to 8192!


model.safetensors.index.json:   0%|          | 0.00/165k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.72G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.25k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/458 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/262M [00:00<?, ?B/s]

Unsloth 2024.12.4 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


['"\nInstructions: You are an expert traffic volume prediction model, that can predict the future volume values according to spatial temporal information. We want you to perform the traffic volume prediction task, considering the nearby environment and historical traffic volume data.\n Context knowledge you could consider:\n     - Traffic volume: the number of vehicles passing a specific region in an hour, usually ranging from 0 to 1000.\n     - Traffic pattern characteristic: Traffic flow patterns in a city are influenced by various area attributes. Also, traffic volume has a periodic daily and weekly pattern.\n     - Spatial temporal factors correlation: Traffic flow in an area will be affected by its nearby infrastructures, during specific periods for different areas. You should think about how the volume will change in a specific area, during a specific time.\n       For examples,\n             Airports, and train stations - increased volume on weekends and holidays.\n             

## Ground truth Extraction

In [None]:
import json

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""


EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["system"]
    inputs       = examples["user"]
    # outputs      = examples["assistant"]
    outputs = ""
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset_eval = load_dataset("TheFuriousGunner/CATraffic_Finetuning_eval", split = "test")
# dataset = dataset.map(formatting_prompts_func, batched = True,)

ground_truth = []
counter = 1
for i in dataset_eval["assistant"]:
    counter += 1
    ground_truth.append(i)
    if counter > 1000:
      break

with open('Ground Truth_mini.json', 'w') as file:
    json.dump(ground_truth, file, indent=4)


README.md:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

eval_finetuning_conversations.parquet:   0%|          | 0.00/6.71M [00:00<?, ?B/s]

Generating test split:   0%|          | 0/50660 [00:00<?, ? examples/s]

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

## Prediction Extraction

In [None]:
import json


alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# Function to extract the relevant part of the output
def extract_traffic_volume(output):
    # Find the start and end of the relevant part
    start = output.find('### Response:')
    if start == -1:
        print("Start marker '### Response:' not found")
        return None
    start += len('### Response:')
    end = output.find(']}', start) + 2
    if end == -1:
        print("End marker ']} not found'")
        return None
    return output[start:end].strip()


counter = 1
traffic_volumes = []
print(len(dataset_eval["system"]))

for i in range(len(dataset_eval["system"])):
    system_prompt = dataset_eval["system"][i]
    user_prompt = dataset_eval["user"][i]
    inputs = tokenizer(
    [
        alpaca_prompt.format(
            system_prompt, # instruction
            user_prompt, # input
            "", # output - leave this blank for generation!
        )
    ], return_tensors = "pt").to("cuda")

    output = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
    output_decode = tokenizer.batch_decode(output)
    str_output = ''.join(map(str, output_decode))
    extracted_data = extract_traffic_volume(str_output)
    if extracted_data:
        traffic_volumes.append(extracted_data)
    if i % 10 == 0:
      print(i, 'outputs extracted')
      counter += 1
    if i > 998:
      break

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["system"]
    inputs       = examples["user"]
    # outputs      = examples["assistant"]
    outputs = ""
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

# Write the extracted data to a JSON file
with open('Predictions_Phi3_medium.json', 'w') as f:
    json.dump(traffic_volumes, f, indent=4)

50660
0 outputs extracted
10 outputs extracted
20 outputs extracted
30 outputs extracted
40 outputs extracted
50 outputs extracted
60 outputs extracted
70 outputs extracted
80 outputs extracted
90 outputs extracted
100 outputs extracted
110 outputs extracted
120 outputs extracted
130 outputs extracted
140 outputs extracted
150 outputs extracted
160 outputs extracted
170 outputs extracted
180 outputs extracted
190 outputs extracted
200 outputs extracted
210 outputs extracted
220 outputs extracted
230 outputs extracted
240 outputs extracted
250 outputs extracted
260 outputs extracted
270 outputs extracted
280 outputs extracted
290 outputs extracted
300 outputs extracted
310 outputs extracted
320 outputs extracted
330 outputs extracted
340 outputs extracted
350 outputs extracted
360 outputs extracted
370 outputs extracted
380 outputs extracted
390 outputs extracted
400 outputs extracted
410 outputs extracted
420 outputs extracted
430 outputs extracted
440 outputs extracted
450 outputs ext

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "What is a famous tall tower in Paris?", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

["Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is a famous tall tower in Paris?\n\n### Input:\n\n\n### Response:\n\nThe famous tall tower in Paris is the Eiffel Tower. It was built between 1887 and 1889 as the entrance arch for the 1889 World's Fair and has become a global cultural icon of France. It is named after the engineer Gustave Eiff"]

You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).