# SetFit for Multilabel Text Classification

In this notebook, we'll learn how to do few-shot text classification on a multilabel dataset with SetFit.

## Setup

If you're running this Notebook on Colab or some other cloud platform, you will need to install the `setfit` library. Uncomment the following cell and run it:

In [None]:
%pip install setfit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting setfit
  Downloading setfit-0.7.0-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.9/45.9 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets>=2.3.0 (from setfit)
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence-transformers>=2.2.1 (from setfit)
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting evaluate>=0.3.0 (from setfit)
  Downloading evaluate-0.4.0-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m12.6 MB/s

To be able to share your model with the community, there are a few more steps to follow.

First, you have to store your authentication token from the Hugging Face Hub (sign up [here](https://huggingface.co/join) if you haven't already!). To do so, execute the following cell and input an [access token](https://huggingface.co/docs/hub/security-tokens) associated with your account:

In [None]:
from huggingface_hub import notebook_login
#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
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Then you need to install Git-LFS, which you can do by uncommenting and running following command:

In [None]:
!apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
git-lfs is already the newest version (2.9.2-1).
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.


Finally, you may need to configue Git on your system by providing details about who you are:

In [None]:
 !git config --global user.email "agarcf15@estudiantes.unileon.es"
 !git config --global user.name "agarcf15"

This notebook is designed to work with any multiclass [text classification dataset](https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads) and pretrained [Sentence Transformer](https://huggingface.co/models?library=sentence-transformers&sort=downloads) on the Hub. Change the values below to try a different dataset / model!

In [None]:
from datasets import load_dataset

model_id = "sentence-transformers/paraphrase-mpnet-base-v2"
dataset = load_dataset("agarc15/CYULEKAGGLE")

Downloading and preparing dataset csv/agarc15--CYULEKAGGLE to /root/.cache/huggingface/datasets/agarc15___csv/agarc15--CYULEKAGGLE-0300cc0ee6fd8e23/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/123k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/agarc15___csv/agarc15--CYULEKAGGLE-0300cc0ee6fd8e23/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['Description', 'Type', 'Data destruction', 'Defacement', 'Denial of service', 'Doxing', 'Espionage', 'Financial Theft', 'Sabotage'],
        num_rows: 481
    })
})

## Loading and sampling the dataset

Most datasets on the Hub have many more labeled examples than those one encounters in few-shot settings. To simulate the effect of training on a limited number of examples, let's subsample the training set to have at least 8 labeled examples per feature.

Note that if your dataset has differently formatted labels, you may need to adapt this section.

In [None]:
import numpy as np

features = dataset["train"].column_names
features.remove("Type")
features.remove("Description")
features

['Data destruction',
 'Defacement',
 'Denial of service',
 'Doxing',
 'Espionage',
 'Financial Theft',
 'Sabotage']

In [None]:
num_samples = 8
samples = np.concatenate(
    [np.random.choice(np.where(dataset["train"][f])[0], num_samples) for f in features]
)

We encode the emotions in a single `'label'` feature. 

In [None]:
def encode_labels(record):
    return {"labels": [record[feature] for feature in features]}


dataset = dataset.map(encode_labels)

Map:   0%|          | 0/481 [00:00<?, ? examples/s]

Next, we use the samples we selected as our training set, and the others as our test set (since the ethos dataset does not have a test split on the hub).

Here we have 64 total examples to train with since the `ethos` dataset has 8 classes.

In [None]:
train_dataset = dataset["train"].select(samples)
eval_dataset = dataset["train"].select(
    np.setdiff1d(np.arange(len(dataset["train"])), samples)
)

Okay, now we have the dataset, let's load and train a model!

## Fine-tuning the model

To train a SetFit model, the first thing to do is download a pretrained checkpoint from the Hub. We can do so by using the `from_pretrained()` method associated with the `SetFitModel` class.

**Note that the `multi_target_strategy` parameter here signals to both the model and the trainer to expect a multi-labelled dataset.**

In [None]:
from setfit import SetFitModel

model = SetFitModel.from_pretrained(model_id, multi_target_strategy="one-vs-rest")

Downloading (…)lve/main/config.json:   0%|          | 0.00/594 [00:00<?, ?B/s]

Downloading (…)f39ef/.gitattributes:   0%|          | 0.00/690 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)0182ff39ef/README.md:   0%|          | 0.00/3.70k [00:00<?, ?B/s]

Downloading (…)82ff39ef/config.json:   0%|          | 0.00/594 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)f39ef/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.19k [00:00<?, ?B/s]

Downloading (…)0182ff39ef/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)2ff39ef/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.


Here, we've downloaded a pretrained Sentence Transformer from the Hub and added a logistic classification head to the create the SetFit model. As indicated in the message, we need to train this model on some labeled examples. We can do so by using the `SetFitTrainer` class as follows:

In [None]:
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitTrainer

trainer = SetFitTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=CosineSimilarityLoss,
    num_iterations=20,
    num_epochs=4, # The number of epochs to use for contrastive learning
    batch_size=8, #por encima de 8 me quedo sin VRAM
    column_mapping={"Description": "text", "labels": "label"},
)

The main arguments to notice in the trainer is the following:

* `loss_class`: The loss function to use for contrastive learning with the Sentence Transformer body
* `num_iterations`: The number of text pairs to generate for contrastive learning
* `column_mapping`: The `SetFitTrainer` expects the inputs to be found in a `text` and `label` column. This mapping automatically formats the training and evaluation datasets for us.

Now that we've created a trainer, we can train it!

In [None]:
trainer.train()

Applying column mapping to training dataset


Generating Training Pairs:   0%|          | 0/20 [00:00<?, ?it/s]

***** Running training *****
  Num examples = 2240
  Num epochs = 4
  Total optimization steps = 1120
  Total train batch size = 8


Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Iteration:   0%|          | 0/280 [00:00<?, ?it/s]

Iteration:   0%|          | 0/280 [00:00<?, ?it/s]

Iteration:   0%|          | 0/280 [00:00<?, ?it/s]

Iteration:   0%|          | 0/280 [00:00<?, ?it/s]

The final step is to compute the model's performance using the `evaluate()` method. The default metric measures 'subset accuracy', which measures the fraction of samples where we predict all 8 labels correctly.

In [None]:
metrics = trainer.evaluate()
metrics

Applying column mapping to evaluation dataset
***** Running evaluation *****


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

{'accuracy': 0.5981735159817352}

And once the model is trained, you can push it to the Hub:

In [None]:
trainer.push_to_hub(f"agarc15/TESTMULTI")

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

model_head.pkl:   0%|          | 0.00/52.8k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

'https://huggingface.co/agarc15/TESTMULTI/tree/main/'

You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `your-username/the-name-you-picked` so for instance:

In [None]:
from setfit import SetFitModel

model = SetFitModel.from_pretrained("agarc15/TESTMULTI")

Downloading (…)lve/main/config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Downloading (…)8715f/.gitattributes:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)eebe38715f/README.md:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

Downloading (…)be38715f/config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading model_head.pkl:   0%|          | 0.00/52.8k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Downloading (…)8715f/tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading (…)eebe38715f/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)e38715f/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

Downloading model_head.pkl:   0%|          | 0.00/52.8k [00:00<?, ?B/s]

Run inference. As is usual in toxicity models, it tends to think any mention of topics such as race or gender are negative.

In [None]:
preds = model(
    [
        " Application Programming Interface (API) adoption is steadily increasing in the healthcare sector, but APIs do not come without cybersecurity risks. In fact, Gartner predicted that API attacks would become the most common attack vector by 2022.. In healthcare, evidence suggests that API adoption could revolutionize interoperability efforts and health data exchange. In addition, providers are increasingly implementing APIs to comply with the CMS Interoperability and Patient Access final rule. Meanwhile, the HL7 Fast Healthcare Interoperability Resources (FHIR) standard is quickly gaining recognition in the health IT space.. In a recent report, Imperva partnered with the Marsh McLennan Global Cyber Risk Analytics Center to analyze API-related incident data and quantify the cost of API insecurity. Researchers discovered that the lack of security APIs may cause $12 billion to $23 billion in average annual API-related cyber loss in the US and anywhere from $41 billion to $75 billion globally.. Dig Deeper. Yale New Haven Hospital Research File Implicated in Healthcare Data Breach. 2 Texas Hospitals Infected With Malicious Code May Face PHI Exposure. Meta Sued For Violating Patient Privacy, Scraping Health Data From Hospitals. “These estimates provide a view on losses that are entirely avoidable,” the report suggested.. “If companies made an upfront investment in properly securing all of their APIs, their API-related losses could decrease significantly even as their API adoption continues to increase.”. The research also revealed a correlation between company revenue and API-related event frequency. Companies earning more than $100 billion in revenue attributed a quarter of their cyber events (during the analysis period) to API insecurity.. “The analysis indicates that large firms face an elevated risk of experiencing an API-related incident. This is likely due to increased deployment and utilization of APIs in large companies, which could expose companies to more potential breaches,” the report noted.. The report found healthcare to be one of the biggest adopters of APIs across all sectors. Healthcare API traffic grew by more than 400 percent in 2020, and health monitoring API use increased an additional 941 percent in 2021.. Notably, the healthcare sector was not close to having the highest API event count compared to other industries. Technology-dependent industries such as professional services and retail trade experienced a far higher volume of API events.. The healthcare sector reported less than 25 API-related incidents during the analysis period.. “While some of these discrepancies may be attributed to higher API security standards, it is also likely that the elevated incidence of other cyberattacks, such as lost or stolen data and ransomware in the Healthcare industry, depresses their sector’s API-related event frequency,” the report reasoned.. The industries with the strongest API-related security controls had the least incidents. The report predicted that companies would continue to see rising API-related costs as cybersecurity concerns rise. Having a robust security architecture can help healthcare organizations mitigate these risks.. “Since 2017, API-related events have become increasingly common, impacting a plethora of companies across disparate industries, revenue bands, and geographies,” the report stated.. “This rise—coinciding with a meteoric increase in competing cyber threats, such as ransomware attacks—threatens to compound the already spiraling costs impacting both businesses and insurers.”. Tagged Application Programming Interfaces Cybersecurity Data Security Focus On Disaster Preparedness. ",
        "The Cybersecurity and Infrastructure Security Agency (CISA) issued an Emergency Directive on PrintNightmare July 13th, raising concerns for health IT security leaders. The Emergency Directive alerts all US civilian agencies, including the health IT security sector, to immediately stop service to their Microsoft Windows print spooler and deploy a fix.  The Microsoft print spooler service vulnerability, nicknamed PrintNightmare,  is “being actively exploited,” the CISA alert states. Dig Deeper. OIG: Gaps in CMS ERM Puts Genomic Data Security at Risk HHS Warns Health PACS: Patient Data Vulnerable to Cyber Exploitation Report Draws Patient Privacy Concern with Prenatal Test. An attacker could “take control of an affected system,” CISA stated in a previous alert.  Now, CISA is advising all federal, civilian agencies to “to immediately disable the print spooler service on Microsoft Active Directory Domain Controllers, apply the Microsoft July 2021 cumulative updates, and make additional configuration changes to all Microsoft Windows servers and workstations within one week.” The exploitation “of the vulnerability allows an attacker to remotely execute code with system level privileges, enabling a threat actor to quickly compromise the entire identity infrastructure of a targeted organization,” the alert states.  This emergency alert comes a direct “response to validated active exploitations. CISA is concerned that exploitation of this vulnerability may lead to full system compromise of affected agency networks if left unmitigated.” “Since this exploitation was identified, CISA has been engaged with Microsoft and federal civilian agencies to assess potential risk to federal agencies and critical infrastructure,” CISA’s Executive Assistant Director for Cybersecurity Eric Goldstein said in the statement.  “CISA’s mission is to protect the nation against cybersecurity threats, and this directive reflects our determination to require emergency action for exploitations that pose an unacceptable risk to the federal civilian enterprise. We will continue to actively monitor exploitation of this vulnerability and provide additional guidance, as appropriate.” The directive is for federal agencies, but CISA is encouraging both public and private sector organizations to review the alert and consider steps to mitigate any vulnerability.  The American Hospital Association received the initial Microsoft alert July 1st and responded, noting that any cyberattack on healthcare facilities and systems would be disruptive.  “This critical vulnerability has the potential to be highly disruptive for hospitals and health systems,” John Riggi, AHA's senior advisor for cybersecurity and risk, said in a July 2nd statement.  However, healthcare institutions do not have the same options as other sectors.  “Simply disabling print services in hospitals and health systems is not an option as we have already heard from multiple sources in the field,” Riggi stated. “Printing services are used for everything from printing patient identification wristbands to labels for IV medications. Continuing essential patient care services must be balanced with the potential for remote exploitation of this vulnerability.” On July 6th and 7th, Microsoft released updates to fix the issue.  “Microsoft has completed the investigation and has released security updates to address this vulnerability,” the Microsoft update summary states. “Please see the Security Updates table for the applicable update for your system. We recommend that you install these updates immediately.”  Tagged Cyber Hygiene Cybersecurity Data Encryption Data Privacy. ",
        " The law firm says it alerted the FBI of the incident and posted a breach notification on its website Sunday because the investigation thus far determined that certain information relating to individuals was accessed by the unauthorized actor.. In addition to Apple and Pfizer, the firm's clients include dozens of Fortune 500 and Global 500 companies, such as Marriott International, Boeing, British Airways, Allianz Insurance, Johnson & Johnson and Mercedes Benz.. The law firm says that the attack has now been contained and that there is no active threat to the firm’s network. It did not specify if any data was exfiltrated or leaked. It reported that one of the systems accessed by the hackers that contained sensitive personal information was encrypted by the intruders.. Campbell Conroy & O’Neil told Information Security Media Group that it became aware of unusual activity on its network on Feb. 27 and conducted an investigation that determined ransomware was involved. Regarding the delay in reporting, it said: It takes time to review the data accessed by the unauthorized actor and to determine notification obligations.. The law firm told ISMG that it was the target of a ransomware attack which prevented access to certain files on the system. In response, Campbell began working with third-party forensic investigators to investigate the full nature and scope of the event and also alerted the FBI of the incident.. The firm has notified individuals whose information was accessed by the unauthorized actor. It says the breached system contained individuals’ names, dates of birth, driver’s license numbers/state identification numbers, financial account information, Social Security numbers, passport numbers, payment card information, medical information, health insurance information, biometric data and/or online account credentials, such as usernames and passwords. The firm did not specify how many individuals were affected, simply stating, “a limited number of data types were determined to be accessible.. Campbell Conroy & O’Neil is offering 24 months of prepaid credit monitoring, fraud consultation, and identity theft restoration services to individuals whose Social Security numbers or the equivalent were accessible as a result of the security incident. It says it's working with third-party forensic investigators to investigate the full nature and scope of the event and to determine what information may have been exposed.. The law firm tells ISMG that it's reviewing its policies and procedures and working to implement additional safeguards to further secure its information systems, saying its systems are now fully operational and it does not anticipate any significant impact to ongoing litigation nor to our representation of our valued clients.. Others at Risk?. If data on Fortune 500 companies, was, indeed exposed in this breach, it could open the door to other breaches, says Javvad Malik, security awareness advocate at KnowBe4.. Cybercriminals are increasingly stealing data that they can use to fuel other attacks, Malik says. Because of this, we're seeing more organizations targeted which have traditionally not been on criminals’ radars, he says. This is why it's important that organizations of all sizes and across all industry verticals invest in robust cybersecurity controls, which encompass the technologies, processes and people to reduce the likelihood of becoming victims.. Trevor J. Morgan, product manager at German data security company comforte AG, adds: Law firms and legal service providers - such as processors of legal discovery data - should be paying attention to this breach and immediately assessing their defensive posture. If you’re one of these organizations, you should be asking whether your sensitive data resides in a vulnerable clear state behind what you believe is a well-protected perimeter, or whether you apply some form of data-centric security to it.. Reward for Reporting. In response to the current surge in ransomware and other cyberattacks, the U.S. Department of State said last week it will offer rewards of up to $10 million for information about cyberthreats to the nation's critical infrastructure (see: US Offering $10 Million Reward for Cyberthreat Information).. The Cybersecurity and Infrastructure Security Agency also launched a new ransomware resources website called StopRansomware for businesses, individuals and organizations.",
        " Meanwhile, U.S. Sen. Marco Rubio, R-Fla., has requested that the FBI provide all assistance necessary for the investigation, calling the incident a national security issue.. I will be asking the @FBI to provide all assistance necessary in investigating an attempt to poison the water supply of a #Florida city.. This should be treated as a matter of national security.. https://t.co/XhGNLplNpr via @vice. — Marco Rubio (@marcorubio) February 8, 2021. In the aftermath of the Florida hacking incident, the Cybersecurity and Infrastructure Security Agency on Feb. 11 issued a warning to operators of other plants to be on the lookout for hackers who exploit remote access software and outdated operating systems.. Warner's Requests. In a letter released Wednesday, Warner, the chair of the Senate Intelligence Committee, notes that the Florida incident has raised broad cybersecurity concerns about the nation's critical infrastructure.. This incident has implications beyond the 15,000-person town of Oldsmar, the Virginia senator writes. While the Oldsmar water treatment facility incident was detected with sufficient time to mitigate serious risks to the citizens of Oldsmar, and appears to have been identified as the result of a diligent employee monitoring this facility's operations, future compromises of this nature may not be detected in time.. In the letter, Warner demands information on three specific aspects of the investigation.. He asks the FBI for an update on any details about the hack it has uncovered.. In addition, he asks the EPA if the Oldsmar water treatment facility was compliant with the most recent version of the Water and Wastewater Sector-Specific Plan, a framework for developing security and resilience plans for these types of plants that are considered critical infrastructure. The senator also asks the agency if the plan, which was last updated in 2015, needs revisions to address the types of cybersecurity concerns raised by the Florida incident.. Finally, he asks for confirmation that agencies are sharing threat intelligence about this incident with other water treatment facilities throughout the U.S. as well as other organizations that are considered part of the nation's critical infrastructure.. Other critical infrastructure sectors, such as healthcare, emergency services, energy, food and agriculture and transportation systems, depend on the cyber resilience of water facilities, Warner notes.. A spokesperson for the FBI could not be immediately reached for comment about the letter. An EPA spokesperson says the agency received the letter and would respond.. Hacker Gained Remote Access. Officials in Oldsmar, Florida, say a hacker gained remote access to a system to increase the amount of lye in the city's water system, but the attack was immediately thwarted (see: 5 Critical Questions Raised by Water Treatment Facility Hack).. As part of the initial investigation, officials found that the facility's staff routinely used TeamViewer to remotely gain access to some systems.. Some computers at the Florida plant reportedly were network-connected to the supervisory control and data acquisition - aka SCADA - system and were running outdated 32-bit versions of Windows 7, which is no longer supported by Microsoft (see: Florida City's Water Hack: Poor IT Security Laid Bare).. More Questions. Investigators and lawmakers should be asking the EPA and other agencies even more questions, says Mike Hamilton, a former vice chair of the Department of Homeland Security's State, Local, Tribal, and Territorial Government Coordinating Council.. For example, he says the EPA should clarify whether it has established criteria and funding for cyber risk management at water treatment plants.. And he says the FBI and EPA should be pressed to determine if the hacker who targeted the Florida plant exploited a specific vulnerability or just gained access to system credentials.. The investigation should seek to establish the taxonomic identity of the threat actor or actors, such as whether this was opportunistic, hacktivist, criminal or nation-state activity, as this will help to understand the motivation of the actors and the likelihood of further events, says Hamilton, who now serves as CISO of CI Security. Secondarily, investigators should review the guidance provided by the EPA and other agencies as to security requirements and the extent to which those requirements span the information and operational technology demarcation point.. Austin Berglas, who was an assistant special agent in charge of cyber investigations at the FBI's New York office and is now the global head of professional services at cybersecurity firm BlueVoyant, says the FBI is likely investigating the incident from two separate angles.. The FBI’s focus for this investigation will be to first, ensure that the incident is contained and there is no ongoing threat to the water treatment plant, Berglas says. Second, the FBI will attempt to attribute the attack to a specific group or individual through analysis of indicators of compromise and the tactics, techniques, and procedures used by the attacker.", 
        " There has been a continued increase in the number of organizations utilizing PHI sharing through Direct exchange, with a 15 percent increase in the number of trusted Direct addresses able to share PHI, according to a DirectTrust statement.. The organization added that there was a 68 percent increase in the number of healthcare organizations served by DirectTrust health information service providers (HISPs) and engaged in Direct exchange.. It is very satisfying to see the demand for Direct grow and to witness the physician and provider community further embracing the use of Direct exchange for secure messaging throughout hospitals and medical practices, DirectTrust President and CEO David. C. Kibbe, MD MBA, said in the release. Additionally, vendors across the health IT industry are increasingly enhancing their usability for Direct, while health care providers are broadening their use of Direct beyond clinical messaging to include administrative and research communications.  Dig Deeper. DirectTrust Addresses Secure Messaging Adoption Barriers. DirectTrust Voices Concern for Cybersecurity in Healthcare. Secure Email Key in New DirectTrust Patients Program. The end of 2017 Q2 also saw a 74 percent increase in Direct exchange transactions, totaling 40.1 million. Furthermore, Direct exchange transactions rose to over 241 million at the end of the second quarter, DirectTrust explained.. Five healthcare organizations have joined DirectTrust since April 1, including a digital health technology company, a messaging app, and a practice management and electronic billing solutions firm. The total membership of DirectTrust organizations is now 129.. The newest members are the following:. Mirah, Inc.. vitaTrackr, Inc. TechSoft, Inc. PatientMD Care3, Inc. Kibbe added that “DirectTrust continues to attract organizations that bring innovative technology and deep knowledge in health care information exchange and interoperability.” Current members will continue to prosper and grow their interoperable exchange with the newest organizations joining the DirectTrust network.. DirectTrust also reported growth from Q1 2016 to Q1 2017, with the number of addresses using Direct for PHI sharing rose 21 percent to 1.4 million.. Direct also reported 35.6 million Direct exchange transactions in Q1 2017, a 76 percent increase over Q1 2016. The non-profit entity added that it predicted 140 million transactions by the end of 2017.. We believe health care providers and their organizations are beginning to learn how to optimize secure data transport via Direct by combining it with more reliable and useful content, and with better workflows for care coordination,” Kibbe explained in an April 2017 statement. “Whether used for peer-to-peer messaging, for transport of lab results, to send data to clinical repositories, or to combine clinical file attachments with billing statements, Direct interoperability is replacing fax and mail because it is more secure, less costly, and can be tracked much more easily within EHRs and other applications.. Direct messaging options can be especially beneficial for healthcare, and is quickly becoming a popular option with the interoperability push and need for secure data exchange.. DirectTrust is a non-profit trade alliance that facilitates secure HIE through the Direct Protocol. It also forms secure HIE policies and standards.. However, Direct does not just focus on secure email. Information can be transported via Direct from server to server, or from server to endpoint person.. Direct messaging “has a lot more capability than simply to be used as a means of person-to-person communication,” Kibbe said in a previous interview with HealthITInteroperability.com.. “Health information exchanges all over the country use Direct exchange to send alerts to a medical practice when a patient whom the HIE has received an ADT message about has either been admitted to the hospital or is about to leave the hospital or the emergency room,” he noted.. Devices can also have Direct access to an endpoint, and can then send the information contained in the device's output to a server. Organizations might also opt for Direct as incentives for health data exchange increases, such as through the financial incentives under MACRA, Kibbe continued.. “The Comprehensive Primary Care Plus is a great example where there is a strong incentive on the part of the participants in those programs to move the data quickly and securely and electronically as opposed to slowly and by paper or fax,” Kibbe said. “The workflow has to be accomplished quickly and Direct is a very, very good way to do it quickly.”. Tagged Direct Secure Messaging DirectTrust. ",
        " On Monday, four U.S. Democratic lawmakers called for legislation or an executive order to crack down on privately built spyware. They also called for consideration of potential sanctions again all individuals and organizations that sell such software.. Enough is enough. The recent revelations regarding misuse of the NSO Group's software reinforce our conviction that the hacking-for-hire industry must be brought under control, according to the joint statement from Reps. Tom Malinowski of New Jersey, Katie Porter and Anna G. Eshoo of California, and Joaquin Castro of Texas.. Private companies should not be selling sophisticated cyber-intrusion tools on the open market, and the United States should work with its allies to regulate this trade, they said.. Meanwhile, French President Emmanuel Macron last week reportedly called Israeli Prime Minister Naftali Bennett to demand a thorough government investigation into the use of Pegasus spyware, including how such software gets approved for export and subsequently policed.. Alleged Targeting List. The controversial software is again in the limelight following allegations that a list of customers' supposed targets included contact details for 50,000 individuals.. Named on that list were Macron; the presidents of Iraq and of South Africa; the prime ministers of Egypt, Morocco and Pakistan; seven former prime ministers who were in office when their names were added to the list; and the king of Morocco.. The list was obtained as part of data leaked to French nonprofit journalism group Forbidden Stories. Working with technical experts at rights group Amnesty International and 17 media organizations as part of a joint Pegasus Project, the group began publishing details of its monthslong research effort on July 18.. In the Thursday call, Bennett assured Macron that he would launch a high-level investigation, while also emphasizing that the alleged behavior took place before Bennett became Israel's prime minister, Israel's Channel 12 News reported Saturday.. On Thursday, the Foreign Affairs and Defense Committee of Israel's Knesset - aka parliament - created a committee to probe the use of Pegasus spyware by foreign governments and whether Israel's export-control checks on who gets granted a license to use the software need to be tightened, The Times of Israel reported.. The defense establishment appointed a review committee made up of a number of bodies, lawmaker Ram Ben-Barak, the former deputy head of Israel's Mossad intelligence agency, told Army Radio on Thursday, The Times of Israel reported. When they finish their review, we'll demand to see the results and assess whether we need to make corrections.. Israeli Defense Minister Benny Gantz, in a trip reportedly planned some time ago, is scheduled to travel to France on Wednesday to discuss issues with French Defense Minister Florence Parly. In a statement, the Israeli government said those discussions will also now focus on NSO Group, Haaretz reported.. NSO Group Denies Allegations. How the leaked data was obtained remains unclear, as does the purpose of the apparent targeting list. Forbidden Stories says the list includes 50,000 individuals' contact details - across 50 countries - amassed by these 10 Pegasus-using governments: Azerbaijan, Bahrain, Hungary, India, Kazakhstan, Mexico, Morocco, Rwanda, Saudi Arabia and the United Arab Emirates.. How many of these apparent targets of interest were targeted with Pegasus spyware remains unknown. While Amnesty International was able to study some smartphones for signs of infection, it has been unable to obtain access to devices used by the vast majority of individuals on the list.. NSO Group has continued to deny that the list was in any way a master list of individuals being targeted. The company has claimed to have about 45 government customers and says each only targets about 100 individuals per year.. We would like to emphasize that NSO sells it technologies solely to law enforcement and intelligence agencies of vetted governments for the sole purpose of saving lives through preventing crime and terror acts, the company said in a statement issued last week. NSO does not operate the system and has no visibility to the data.. But security experts who track spyware have asked: If NSO has no visibility into the data, how does it investigate claims that its software has been misused, for example, by autocratic regimes to spy on citizens? On Wednesday, for example, Chaim Gelfand, the chief compliance officer at NSO Group, told Israeli television network i24 that it could specifically come out and say for sure that the president of France, Macron, was not a target.. Exactly how NSO Group reviews such allegations - and if it does so proactively, or only in response to reports by investigative journalists or other third parties - remains unclear.. Last week, a company spokesman told Information Security Media Group that whenever NSO Group investigates allegations of inappropriate use by a customer, they are obligated to provide us with such information.. Probes in France, Mexico and Beyond. French prosecutors, meanwhile, have launched their own investigation into the leak, following French investigative website Mediapart and satirical newspaper Le Canard Enchaine both filing complaints on July 19. Mediapart attributed the spying against it to Morocco's security services, saying the spying came after it published reports on how the North African kingdom targets journalists and human rights advocates.. Mexico's president, Andrés Manuel López Obrador, aka AMLO, has also launched a probe into the software, following revelations that the previous administration had used the software against him, his family members and advisers while he was running for president.. Mexican officials say they're reviewing the government's decision to purchase a license for Pegasus in 2014 for $32 million to see if graft was involved. Obrador has stated that the software now only gets used to conduct surveillance on criminals, rather than against political figures or journalists, AFP reported.. Reports that names on the list included 300 Indian journalists, politicians, lawyers and other citizens led opposition politicians to disrupt parliament on July 20, calling for a full investigation and answers from the government of Prime Minister Narendra Modi about whether it used such software, the Guardian reported.. In the runup to the 2019 national elections, in which Modi was reelected, his chief rival, Rahul Gandhi, as well as several aides and close friends, appear to have had their smartphones infected with Pegasus software, according to news reports.. In Hungary, opponents of the far-right government have called for an investigation into how the software has reportedly been used to target journalists and others.",
        " Lisa Monaco, the nation's second-highest-ranking attorney, told The Associated Press, In the weeks to come, you're going to see more arrests and the seizure of ransom payments issued in cryptocurrency, among other operations.. While Monaco did not offer specifics, she declared: If you come for us, we're going to come for you.. Assessing the state of ransomware crimes, generally, Monaco - who has taken an increasingly public role in pursuing threat actors - said, We have not seen a material change in the landscape. Only time will tell as to what Russia may do on this front.. Still, she added, We're going to continue to press forward to hold accountable those who seek to go after our industries, to hold our data hostage and threaten national security, economic security and personal security.. U.S. National Cyber Director Chris Inglis, however, told House lawmakers on Wednesday that the nation is seeing a discernible decrease in Russia-based cyberattacks.. Meg King, formerly an international manager for the U.S. Department of Defense’s Cooperative Threat Reduction Program, tells ISMG, We need to give [this strategy] time to work, and if one of our most seasoned cyber experts - National Cyber Director Chris Inglis - says the U.S. has seen a 'discernible decrease' in attacks emanating from Russia, I'm encouraged.. Rosa Smothers, a former CIA threat analyst and technical intelligence officer, tells ISMG, Aggressive extradition of cybercriminals to make an example of them, coupled with an aggressive bounty program, shows that the DOJ means business and is moving with a sense of urgency on the ransomware issue.. Smothers, currently the senior vice president of cyber operations at the firm KnowBe4, also notes, To put this into context, Thursday's announcement of a $10 million bounty for information leading to the identification or location of senior members of the DarkSide gang … is the same amount of money offered for Sirajuddin Haqqani … who is wanted for questioning in connection with the January 2008 attack on a hotel in Kabul, Afghanistan, that killed six people.. U.S. Deputy Attorney General Lisa Monaco at a press conference in October following a sting operation targeting darknet vendors (Source: U.S. Department of Justice). Alleged Cybercriminal Extradited to US. Monaco's statement comes after an alleged Russian hacker appeared in court in the U.S. last week after being extradited from South Korea on allegations of facilitating transnational cybercrime.. Vladimir Dunaev, 38, a Russian national, is alleged to have pushed TrickBot malware in global cyberattacks between 2015 and 2020 - in particular, targeting schools, government entities and financial institutions. Microsoft acted against the malware group last October, ultimately seizing control of its infrastructure.. According to the DOJ, Dunaev, who faces a maximum of 60 years in prison, is suspected to be a malware developer for the group. He has been charged with conspiracy to commit computer fraud and aggravated identity theft, along with money laundering, wire fraud and bank fraud.. Follow the Money. In June, the DOJ also announced that it had seized 63.7 bitcoins - then valued at $2.3 million - which was considered approximately half of the proceeds from the May ransom payment Colonial Pipeline Co. made to the DarkSide ransomware group. The attack, which led to the pipeline halting operations after finding its systems crypto-locked, resulted in fuel shortages on the East Coast (see: $2.3 Million of Colonial Pipeline Ransom Payment Recovered).. Commenting on that attack, Monaco noted at the time, Following the money remains one of the most basic, yet powerful tools we have. Ransom payments are the fuel that propels the digital extortion engine.. The U.S. government has advised against paying ransoms, suggesting they only embolden cybercriminals.. (Photo: Executium via Unsplash). More Actions by Monaco. The DOJ confirmed in October that it will pursue government contractors that fail to report cybersecurity incidents. Monaco said the department's Civil Cyber-Fraud Initiative will use the False Claims Act, which imposes liability on those defrauding government programs, to hold entities accountable for knowingly violating obligations to monitor and report incidents and breaches (see: US DOJ to Fine Contractors for Failure to Report Incidents).. Monaco also in October announced the creation of a National Cryptocurrency Enforcement Team, or NCET, which she said will investigate and prosecute the misuse of cryptocurrency - particularly crimes committed by crypto exchanges, mixing and tumbling services used to obfuscate funds, and money laundering infrastructure.. Crypto Focus. This month, the DOJ listed a job opening for the director of NCET, who will aid in enforcing digital currency laws and head a team of prosecutors to investigate crypto-related cases. The DOJ says the director will liaise with U.S. Attorneys' Offices and other law enforcement agencies, and partner with the Department of the Treasury's Financial Crimes Enforcement Network, or FinCEN; the Securities and Exchange Commission; and similar agencies around cryptocurrency regulation.. On targeting ransomware operators' cryptocurrency-based model, King, currently director of the science and technology innovation program at The Wilson Center, a nonpartisan think tank, says, Seizing cryptocurrency ransomware payments puts a big dent in the core of the business model: Criminals are no longer assured that they can keep proceeds. This is a critical element of an overall U.S. government strategy to deny ransomware attackers access to the tools they need to succeed.",
        " Hackers are again taking aim at the increased number of remote workers during the COVID-19 pandemic through two new phishing campaigns: one attack method targets Skype credentials, while the other leverages fake Zoom videoconferencing meeting notifications.. The reports come following an FBI alert that warned cybercriminals are targeting the US healthcare sector with COVID-19 phishing attacks.. First, Cofense researchers discovered hackers are spoofing Skype amid the spike in remote work. The phishing emails evaded detection in accounts protected by Microsoft 365 EOP and Proofpoint, making it to the users’ inboxes.. “With so many people working from home, remote work software like Skype, Slack, Zoom, and WebEx are starting to become popular themes of phishing lures. We recently uncovered an interesting Skype phishing email that an end user reported to [Cofense] Phishing Defense Center,” researchers explained.. “For this attack, the threat actor created an email that looks eerily similar to a legitimate pending notification coming from Skype. The threat actor tries to spoof a convincing Skype phone number and email address,” they continued.. READ MORE: WHO Reports COVID-19 Spurs Rapid Rise in Cyberattacks Against Staff. Though the sender address appears legitimate at first, the user can see the real sender address within the return-path display as “sent from”: researchers note this is really an external, compromised account.. The hackers are exploiting the compromised account to send more phishing campaigns disguised as messages from a trusted sender.. Researchers explained the threat actors are bank on urgency and curiosity, as many users may review unexpected notifications from the platforms their companies are leveraging for remote work during the Coronavirus crisis.. If a user clicks the malicious link, they’re shown an impersonated Skype login page that includes the recipient’s company logo on the login box and a disclaimer warning the page is for “authorized use.”. “The username is auto-filled due to the URL containing the base64 of the target email address, thus adding simplicity to the phishing page and leaving little room for doubt. The only thing left for the user to do is to enter his or her password, which then falls into the hands of the threat actor,” researchers explained.. READ MORE: Sens. to DHS CISA: Issue COVID-19 Cyber Threat Guidance for Healthcare. The phishing campaign is hosted by an “.app top level domain,” which allows app developers to securely share their apps through a required HTTPS. Users know to look for an HTTPs for a secure connection. But as the phishing attacks are hosted on this platform, users may not detect that it’s actually a malicious site.. The use of HTTPs for phishing campaigns is not new. The FBI first warned hackers were leveraging “secure” websites to trick users in 2019, telling organizations hackers were “more frequently incorporating website certificates – third-party verification that a site is secure – when they send potential victims emails that imitate trustworthy companies or email contacts.”. But hackers are also leveraging this type of campaign to target Zoom users, as well. Abnormal Security researchers detected phishing attacks posing as Zoom meeting notifications. The email requests the user join a meeting about their job termination, asking users to first log into a fake Zoom page that will actually steal their credentials.. The attack has been seen in more than 50,000 email inboxes, hosted by the Office365 platform. The phishing campaign primarily targets employees in hopes of taking advantage of the spike in remote work.. Much like the Skype campaign, these phishing attacks leverage impersonation: disguising the malicious emails as authentic Zoom email meeting notifications and banking on urgency to trick victims into clicking the link.. READ MORE: Cybercriminals Targeting US Providers with COVID-19 Phishing Attacks. The malicious landing page appears to be a legitimate “carbon copy” of a Zoom login page. Upon further inspection the only functioning feature of the page are the login fields used by the hackers to steal credentials.. Researchers stressed that most users would be “hard-pressed to understand” that the site was indeed malicious and not a legitimate Zoom page. Even frequent Zoom users might look at the login page, believe their session had expired, and attempt to sign in again.. “The email masquerades as an automated notification for an important meeting with HR regarding the recipient’s termination,” researchers explained. “The email contains a link to a fake Zoom login page hosted on ‘zoom-emergency.myftp.org.’ Links to the phishing page are hidden in text used in automated meeting notifications.”. “The email masquerades as a reminder that the recipient has a meeting with HR regarding their termination. When the victim reads the email they will panic, click on the phishing link, and hurriedly attempt to log into this fake meeting,” they continued. “Should recipients fall victim to this attack, login credentials as well as any other information stored on Zoom will be compromised.”. Zoom has remained a prime target for hackers throughout the pandemic, with the company itself facing backlash for multiple privacy issues, such as Zoombombing and other hacking efforts. In response, the videoconferencing platform has put its software development on hold and partnered with private sector stakeholders to improve the security of its platform.. The American Medical Association and American Hospital Association recently released telework guidance for the healthcare sector to help providers bolster their security and reduce some of these vulnerabilities.. Tagged Coronavirus Cybersecurity Employee Security Training Phishing Attacks Risk Management. "
    ]
)
preds

tensor([[0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0]])

In [None]:
# Show predicted labels, requires you to have stored the 'features' somewhere
[[f for f, p in zip(features, ps) if p] for ps in preds]

[['HACK'], ['HACK'], ['HACK'], ['HACK'], [], ['HACK'], ['HACK'], ['HACK']]