##**Text Summary using Huggingface Transformers**

####**Python Code**

In [None]:
import os
import pandas as pd

# Directory containing markdown files
markdown_dir = "hrdataset/employees"

# Parse markdown files into a DataFrame
employee_data = []
for filename in os.listdir(markdown_dir):
    if filename.endswith(".md"):
        with open(os.path.join(markdown_dir, filename), "r") as f:
            lines = f.readlines()
            profile = {}
            for line in lines:
                if line.startswith("- **Name:**"):
                    profile["Name"] = line.split(":")[1].strip()
                elif line.startswith("- **Role:**"):
                    profile["Role"] = line.split(":")[1].strip()
                elif line.startswith("- **Department:**"):
                    profile["Department"] = line.split(":")[1].strip()
                elif line.startswith("- **Joining Date:**"):
                    profile["Joining Date"] = line.split(":")[1].strip()
            if profile:
                employee_data.append(profile)

# Convert to DataFrame
employee_df = pd.DataFrame(employee_data)
print(employee_df)


                 Name                   Role          Department  \
0  ** Rajesh Kulkarni                 ** CTO        ** Executive   
1    ** Neha Malhotra      ** Junior Analyst        ** Logistics   
2       ** Anjali Das        ** HR Executive  ** Human Resources   
3     ** Sunita Patil   ** Finance Executive          ** Finance   
4     ** Priya Sharma  ** Operations Manager       ** Operations   
5      ** Rohit Mehra   ** Logistics Analyst        ** Logistics   
6     ** Karan Kapoor    ** Fleet Supervisor        ** Logistics   
7       ** Meera Iyer   ** Marketing Manager        ** Marketing   
8      ** Aditya Jain    ** Senior Developer               ** IT   
9       ** Amit Verma                 ** CEO        ** Executive   

    Joining Date  
0  ** 2017-11-15  
1  ** 2023-07-01  
2  ** 2021-05-10  
3  ** 2022-01-17  
4  ** 2019-03-15  
5  ** 2020-08-22  
6  ** 2018-11-03  
7  ** 2020-02-20  
8  ** 2019-06-10  
9  ** 2016-02-01  


In [None]:
from transformers import pipeline

# Load pre-trained summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Combine employee details for a specific department
operations_data = " ".join([
    f"{row['Name']} is the {row['Role']} in {row['Department']}, joined on {row['Joining Date']}."
    for _, row in employee_df.iterrows() if row['Department'] == "Operations"
])

# Summarize the data
summary = summarizer(operations_data, max_length=50, min_length=10, do_sample=False)
print("Summary:", summary[0]['summary_text'])


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use mps:0
Your max_length is set to 50, but your input_length is only 3. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=1)


Summary: CNN.com will feature iReporter photos in a weekly Travel Snapshots gallery. Please submit your best shots of New York for next week. Visit CNN.com/Travel next Wednesday for a new gallery of snapshots.


##**Example: Policy Summary in 50 words**

In [4]:
from transformers import pipeline

# Load pre-trained summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Combine employee details for a specific department
policy_data = "Health Insurance: Covers employee and dependents up to ₹5,00,000.Provident Fund: 12% of basic salary contributed to the PF account. Gratuity: Paid on retirement/resignation based on tenure. Travel Allowance: Reimbursement for official travel expenses. Skill Development: Reimbursement for approved certifications/training costs."

# Summarize the data
summary = summarizer(policy_data, max_length=50, min_length=10, do_sample=False)
print("Summary:", summary[0]['summary_text'])


Device set to use cpu


Summary: Provident Fund: 12% of basic salary contributed to the PF account. Gratuity: Paid on retirement/resignation based on tenure. Travel Allowance: Reimbursement for official travel expenses. Skill Development: Re
