#**Text Summary using Huggingface Transformers**
- summarize content using Huggingface model

###**Install Dependencies**

In [1]:
!pip install transformers



To generate a summary using transformers' model we will first download a dataset.

###**Downloads the `hrdataset.zip` file from the CloudYuga GitHub repo**

- Saves it in the current working directory of notebook

(e.g., /content/ in Google Colab).

In [2]:
!wget https://github.com/cloudyuga/mastering-genai-w-python/raw/refs/heads/main/hrdataset.zip

--2025-05-23 06:11:28--  https://github.com/cloudyuga/mastering-genai-w-python/raw/refs/heads/main/hrdataset.zip
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/cloudyuga/mastering-genai-w-python/refs/heads/main/hrdataset.zip [following]
--2025-05-23 06:11:28--  https://raw.githubusercontent.com/cloudyuga/mastering-genai-w-python/refs/heads/main/hrdataset.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9530 (9.3K) [application/zip]
Saving to: ‘hrdataset.zip.1’


2025-05-23 06:11:28 (73.5 MB/s) - ‘hrdataset.zip.1’ saved [9530/9530]



###**Unzip `hrdataset.zip` file**
- It will automatically create **`hrdataset`** folder in our current working directory (/content/ in Google Colab)

In [3]:
!unzip hrdataset.zip

Archive:  hrdataset.zip
replace hrdataset/policies/leave_policies.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/policies/training_and_development.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/policies/employee_benefits.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/policies/holiday_calendar.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/policies/events_calendar.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/surveys/Employee_Culture_Survey_Responses.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/employees/108_Rajesh_Kulkarni.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/employees/106_Neha_Malhotra.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/employees/103_Anjali_Das.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/employees/105_Sunita_Patil.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace hrdataset/employees/101_Priya_Sharma.md? [y]es, [n]o, [A]ll

##**Example: Policy Summary in 50 words**

###**Python Code to Read Policies from Dataset**

In [4]:
# Read all content
import os

policy_dir = "hrdataset/policies"
all_policies_content = {}

for filename in os.listdir(policy_dir):
    if filename.endswith(".md"):
        with open(os.path.join(policy_dir, filename), "r", encoding="utf-8") as f:
            lines = f.readlines()
            if not lines:
                continue
            # First line is the title
            title = lines[0].strip().replace("#", "").strip()
            # Remaining lines as content
            content = " ".join([line.strip() for line in lines[1:] if line.strip()])
            all_policies_content[title] = content

# Optional: View all titles
print("✅ Loaded Policies:\n", list(all_policies_content.keys()))


✅ Loaded Policies:
 ['Leave Policies', 'Training and Development', 'Employee Benefits', 'Holiday Calendar', 'Events and Holiday Calendar']


###**Print Policies content**

In [5]:
for title, content in all_policies_content.items():
    print(f"\n🗂️ {title}\n{'=' * (len(title) + 4)}\n{content}\n")


🗂️ Leave Policies
- **Annual Leave:** 18 days of paid leave per year, accrued monthly. - **Sick Leave:** 12 days of paid leave for medical reasons per year. - **Maternity Leave:** 6 months of paid leave for expecting mothers. - **Paternity Leave:** 15 days of paid leave for new fathers.L - **Compensatory Leave:** Leave granted for working on weekends or holidays.


🗂️ Training and Development
| Employee ID | Name           | Courses Taken                          | Completion Date | Certifications Awarded      | |-------------|----------------|----------------------------------------|-----------------|----------------------------| | 101         | Priya Sharma   | Leadership in Operations              | 2022-12-10      | Certified Operations Manager | | 102         | Rohit Mehra    | Data Analytics for Logistics          | 2021-11-15      | Certified Logistics Analyst | | 103         | Anjali Das     | HR Management Essentials              | 2023-03-05      | Certified HR Professional 

###**Summarize each Policy using Bart model**

In [6]:
from transformers import pipeline

# Load summarizer
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Loop through and summarize each policy section
for title, content in all_policies_content.items():
    print(f"\n📄 {title}")
    # BART has a token limit (max ~1024 tokens); trim if too long
    if len(content.split()) > 900:
        content = " ".join(content.split()[:900])  # truncate safely
    summary = summarizer(content, max_length=60, min_length=15, do_sample=False)
    print("📝 Summary:", summary[0]['summary_text'])


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu



📄 Leave Policies
📝 Summary: L - Annual Leave:** 18 days of paid leave per year, accrued monthly. - Maternity Leave: ** 6 months ofpaid leave for expecting mothers. - Paternity leave:** 15 days of Paid Leave for new fathers. - Compensatory Leave:  Leave granted for

📄 Training and Development
📝 Summary: Employee ID is the name and name of the person holding the employee ID. The employee ID is also the name of that person's course of study. Courses taken include Leadership in Operations, HR Management Essentials, Data Analytics for Logistics and HR Professional.

📄 Employee Benefits
📝 Summary: Health Insurance:** Covers employee and dependents up to ₹5,00,000. Provident Fund:** 12% of basic salary contributed to the PF account. Gratuity:** Paid on retirement/resignation based on tenure.

📄 Holiday Calendar
📝 Summary: Holi, Diwali, Raksha Bandhan, Ganesh Chaturthi, Makar Sankranti, Eid al-Fitr, Christmas, Independence Day, Good Friday, Republic Day are among the festivals on this list.

