# Hugging Face Transformers 微调语言模型-问答任务

我们已经学会使用 Pipeline 加载支持问答任务的预训练模型，本教程代码将展示如何微调训练一个支持问答任务的模型。

**注意：微调后的模型仍然是通过提取上下文的子串来回答问题的，而不是生成新的文本。**

### 模型执行问答效果示例

![Widget inference representing the QA task](docs/images/question_answering.png)

In [1]:
# 根据你使用的模型和GPU资源情况，调整以下关键参数
squad_v2 = False
model_checkpoint = "distilbert-base-uncased"
batch_size = 16

## 下载数据集

在本教程中，我们将使用[斯坦福问答数据集(SQuAD）](https://rajpurkar.github.io/SQuAD-explorer/)。

### SQuAD 数据集

**斯坦福问答数据集(SQuAD)** 是一个阅读理解数据集，由众包工作者在一系列维基百科文章上提出问题组成。每个问题的答案都是相应阅读段落中的文本片段或范围，或者该问题可能无法回答。

SQuAD2.0将SQuAD1.1中的10万个问题与由众包工作者对抗性地撰写的5万多个无法回答的问题相结合，使其看起来与可回答的问题类似。要在SQuAD2.0上表现良好，系统不仅必须在可能时回答问题，还必须确定段落中没有支持任何答案，并放弃回答。


### 下载数据集

In [2]:
from datasets import load_dataset

In [3]:
dataset = load_dataset("squad_v2" if squad_v2 else "squad" )

In [4]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})

#### 对比数据集

相比快速入门使用的 Yelp 评论数据集，我们可以看到 SQuAD 训练和测试集都新增了用于上下文、问题以及问题答案的列：

**YelpReviewFull Dataset：**

```json

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

In [5]:
dataset['train'][0]

{'id': '5733be284776f41900661182',
 'title': 'University_of_Notre_Dame',
 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
 'answers': {'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]}}

#### 从上下文中组织回复内容

我们可以看到答案是通过它们在文本中的起始位置（这里是第515个字符）以及它们的完整文本表示的，这是上面提到的上下文的子字符串。

context:背景信息
question：问题
answers：问题的答案，以及出现的位置

In [6]:
import numpy as np 
from datasets import ClassLabel, Sequence
import pandas as pd
from IPython.display import display, HTML

In [7]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= dataset.num_rows, "Can't pick more elements than there are in the dataset."
    picks = np.random.choice(dataset.num_rows, size=num_examples, replace=True).tolist()
    df = pd.DataFrame(dataset[picks])

    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(lambda x: [typ.feature.names[i] for i in x])

    display(HTML(df.to_html()))

In [8]:
for column, typ in dataset['train'].features.items():
    print(typ)

Value(dtype='string', id=None)
Value(dtype='string', id=None)
Value(dtype='string', id=None)
Value(dtype='string', id=None)
Sequence(feature={'text': Value(dtype='string', id=None), 'answer_start': Value(dtype='int32', id=None)}, length=-1, id=None)


In [9]:
show_random_elements(dataset['train'])

Unnamed: 0,id,title,context,question,answers
0,57304b48396df91900096048,The_Blitz,"Regardless, the Luftwaffe could still inflict huge damage. With the German occupation of Western Europe, the intensification of submarine and air attack on Britain's sea communications was feared by the British. Such an event would have serious consequences on the future course of the war, should the Germans succeed. Liverpool and its port became an important destination for convoys heading through the Western Approaches from North America, bringing supplies and materials. The considerable rail network distributed to the rest of the country. Operations against Liverpool in the Liverpool Blitz were successful. Air attacks sank 39,126 long tons (39,754 t) of shipping, with another 111,601 long tons (113,392 t) damaged. Minister of Home Security Herbert Morrison was also worried morale was breaking, noting the defeatism expressed by civilians. Other sources point to half of the port's 144 berths rendered unusable, while cargo unloading capability was reduced by 75%. Roads and railways were blocked and ships could not leave harbour. On 8 May 1941, 57 ships were destroyed, sunk or damaged amounting to 80,000 long tons (81,000 t). Around 66,000 houses were destroyed, 77,000 people made homeless, and 1,900 people killed and 1,450 seriously hurt on one night. Operations against London up until May 1941 could also have a severe impact on morale. The populace of the port of Hull became 'trekkers', people who underwent a mass exodus from cities before, during, and after attacks. However, the attacks failed to knock out or damage railways, or port facilities for long, even in the Port of London, a target of many attacks. The Port of London in particular was an important target, bringing in one-third of overseas trade.",What did the British fear most?,"{'text': ['intensification of submarine and air attack'], 'answer_start': [109]}"
1,5727624e5951b619008f892d,Late_Middle_Ages,"The period saw several important technical innovations, like the principle of linear perspective found in the work of Masaccio, and later described by Brunelleschi. Greater realism was also achieved through the scientific study of anatomy, championed by artists like Donatello. This can be seen particularly well in his sculptures, inspired by the study of classical models. As the centre of the movement shifted to Rome, the period culminated in the High Renaissance masters da Vinci, Michelangelo and Raphael.",What did Donatello study that inspired sculptures?,"{'text': ['classical models'], 'answer_start': [357]}"
2,572bd4c0750c471900ed4c20,Tennessee,"Tennessee has played a critical role in the development of many forms of American popular music, including rock and roll, blues, country, and rockabilly.[not verified in body] Beale Street in Memphis is considered by many to be the birthplace of the blues, with musicians such as W.C. Handy performing in its clubs as early as 1909.[not verified in body] Memphis is also home to Sun Records, where musicians such as Elvis Presley, Johnny Cash, Carl Perkins, Jerry Lee Lewis, Roy Orbison, and Charlie Rich began their recording careers, and where rock and roll took shape in the 1950s.[not verified in body] The 1927 Victor recording sessions in Bristol generally mark the beginning of the country music genre and the rise of the Grand Ole Opry in the 1930s helped make Nashville the center of the country music recording industry.[not verified in body] Three brick-and-mortar museums recognize Tennessee's role in nurturing various forms of popular music: the Memphis Rock N' Soul Museum, the Country Music Hall of Fame and Museum in Nashville, and the International Rock-A-Billy Museum in Jackson. Moreover, the Rockabilly Hall of Fame, an online site recognizing the development of rockabilly in which Tennessee played a crucial role, is based in Nashville.[not verified in body]",Which Tennessee city is home to the Country Music Hall of Fame?,"{'text': ['Nashville'], 'answer_start': [1034]}"
3,57278395f1498d1400e8fa54,Switzerland,"During World War II, detailed invasion plans were drawn up by the Germans, but Switzerland was never attacked. Switzerland was able to remain independent through a combination of military deterrence, concessions to Germany, and good fortune as larger events during the war delayed an invasion. Under General Henri Guisan central command, a general mobilisation of the armed forces was ordered. The Swiss military strategy was changed from one of static defence at the borders to protect the economic heartland, to one of organised long-term attrition and withdrawal to strong, well-stockpiled positions high in the Alps known as the Reduit. Switzerland was an important base for espionage by both sides in the conflict and often mediated communications between the Axis and Allied powers.",What were the Reduit?,"{'text': ['strong, well-stockpiled positions high in the Alps'], 'answer_start': [569]}"
4,572ec0e9cb0c0d14000f1500,Muammar_Gaddafi,"Having removed the monarchical government, Gaddafi proclaimed the foundation of the Libyan Arab Republic. Addressing the populace by radio, he proclaimed an end to the ""reactionary and corrupt"" regime, ""the stench of which has sickened and horrified us all."" Due to the coup's bloodless nature, it was initially labelled the ""White Revolution"", although was later renamed the ""One September Revolution"" after the date on which it occurred. Gaddafi insisted that the Free Officers' coup represented a revolution, marking the start of widespread change in the socio-economic and political nature of Libya. He proclaimed that the revolution meant ""freedom, socialism, and unity"", and over the coming years implemented measures to achieve this.",How did Gaddafi announced his leadership?:,"{'text': ['Addressing the populace by radio'], 'answer_start': [106]}"
5,56df7b3d56340a1900b29c0e,Plymouth,"Plymouth City Council is responsible for waste management throughout the city and South West Water is responsible for sewerage. Plymouth's electricity is supplied from the National Grid and distributed to Plymouth via Western Power Distribution. On the outskirts of Plympton a combined cycle gas-powered station, the Langage Power Station, which started to produce electricity for Plymouth at the end of 2009.",Who distributes electricity in Plymouth?,"{'text': ['Western Power Distribution'], 'answer_start': [218]}"
6,56d1162317492d1400aab8ec,The_Legend_of_Zelda:_Twilight_Princess,"The story focuses on series protagonist Link, who tries to prevent Hyrule from being engulfed by a corrupted parallel dimension known as the Twilight Realm. To do so, he takes the form of both a Hylian and a wolf, and is assisted by a mysterious creature named Midna. The game takes place hundreds of years after Ocarina of Time and Majora's Mask, in an alternate timeline from The Wind Waker.",Who is the protagonist is Legend of Zelda?,"{'text': ['Link'], 'answer_start': [40]}"
7,56f6e729711bf01900a4485b,Classical_music,"Classical music is art music produced or rooted in the traditions of Western music, including both liturgical (religious) and secular music. While a similar term is also used to refer to the period from 1750 to 1820 (the Classical period), this article is about the broad span of time from roughly the 11th century to the present day, which includes the Classical period and various other periods. The central norms of this tradition became codified between 1550 and 1900, which is known as the common practice period. The major time divisions of classical music are as follows: the early music period, which includes the Medieval (500–1400) and the Renaissance (1400–1600) eras; the Common practice period, which includes the Baroque (1600–1750), Classical (1750–1820), and Romantic eras (1804–1910); and the 20th century (1901–2000) which includes the modern (1890–1930) that overlaps from the late 19th-century, the high modern (mid 20th-century), and contemporary or postmodern (1975–2015) eras.[citation needed]",From 1804-1910 was called what era?,"{'text': ['Romantic'], 'answer_start': [775]}"
8,5725ce0a38643c19005acd47,Israel,"Israel is a leading country in the development of solar energy. Israel is a global leader in water conservation and geothermal energy, and its development of cutting-edge technologies in software, communications and the life sciences have evoked comparisons with Silicon Valley. According to the OECD, Israel is also ranked 1st in the world in expenditure on Research and Development (R&D) as a percentage of GDP. Intel and Microsoft built their first overseas research and development centers in Israel, and other high-tech multi-national corporations, such as IBM, Google, Apple, HP, Cisco Systems, and Motorola, have opened R&D facilities in the country.",Israel is a leading country of what development?,"{'text': ['solar energy'], 'answer_start': [50]}"
9,572831873acd2414000df6b3,European_Central_Bank,"Think-tanks such as the World Pensions Council have also argued that European legislators have pushed somewhat dogmatically for the adoption of the Basel II recommendations, adopted in 2005, transposed in European Union law through the Capital Requirements Directive (CRD), effective since 2008. In essence, they forced European banks, and, more importantly, the European Central Bank itself e.g. when gauging the solvency of financial institutions, to rely more than ever on standardised assessments of credit risk marketed by two non-European private agencies: Moody's and S&P.",What said that agencies had to start using Moody's and S&P to assess financial institutions?,"{'text': ['the Basel II recommendations'], 'answer_start': [144]}"


## 预处理数据

In [10]:
from transformers import AutoTokenizer

In [11]:
# 想要使用的模型可以从 from_pretrained() 方法的预训练模型的名称或路径中推测出来。
# 获取 Tokenizer
# 加载预训练的模型的分词器
# 该方法根据指定的模型检查点（model_checkpoint）自动加载与之相对应的预训练分词器。这个检查点通常是一个模型的名称，
# 如bert-base-uncased、gpt-2等。
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

##### 以下断言确保我们的 Tokenizers 使用的是 FastTokenizer（Rust 实现，速度和功能性上有一定优势）。

In [12]:
import transformers
assert isinstance(tokenizer, transformers.PreTrainedTokenizerFast)

可以在大模型表上查看哪种类型的模型具有可用的快速标记器，哪种类型没有。

可以直接在两个句子上调用此分词器（一个用于答案，一个用于上下文）：


双句子输入：当你传递两个句子给分词器时，它通常会将这两个句子视为一对句子。这在一些任务中很常见，比如问答任务或句子关系判断任务。

分词处理：分词器会将每个句子分割成更小的单元（词、子词或符号）。对于某些模型（如BERT），它还会添加特殊的标记，如 [CLS] 和 [SEP]，以分隔句子并标记句子的开始和结束。

输出：分词器的输出通常包含几个组件，最主要的是 input_ids（分词后的词汇表中的ID序列），以及可能的是 attention_mask（标识哪些ID是有意义的，哪些是填充的）和 token_type_ids（标识每个令牌属于哪个句子）。


In [13]:
# 获得编码
token = tokenizer("what is your name?", "my name is AnMin")

In [14]:
token

{'input_ids': [101, 2054, 2003, 2115, 2171, 1029, 102, 2026, 2171, 2003, 2019, 10020, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

In [64]:
token.keys()

dict_keys(['input_ids', 'attention_mask'])

In [15]:
tokenizer.decode(token['input_ids'])

'[CLS] what is your name? [SEP] my name is anmin [SEP]'

### Tokenizer 进阶操作

**在问答预处理中的一个特定问题是如何处理非常长的文档。**

在其他任务中，当文档的长度超过模型最大句子长度时，我们通常会截断它们，但在这里，删除上下文的一部分可能会导致我们丢失正在寻找的答案。

为了解决这个问题，我们允许数据集中的一个（长）示例生成多个输入特征，每个特征的长度都小于模型的最大长度（或我们设置的超参数）。

In [16]:
# The maximum length of a feature (question and context)
# 一个特征的最大长度(问题和上下文)
max_length = 384

# The authorized overlap between two part of the context when splitting it is needed.
# 需要拆分上下文时，上下文的两个部分之间的授权重叠。
doc_stride = 128

#### 超出最大长度的文本数据处理

下面，我们从训练集中找出一个超过最大长度（384）的文本：

In [17]:
# 找到一个长度大于384的文本作为例子

for i, example in enumerate(dataset["train"]):
    if len(tokenizer(example["question"], example["context"])["input_ids"]) > 384:
        break
# 挑选出来超过384（最大长度）的数据样例
example = dataset["train"][i]

In [18]:
example

{'id': '5733caf74776f4190066124c',
 'title': 'University_of_Notre_Dame',
 'context': "The men's basketball team has over 1,600 wins, one of only 12 schools who have reached that mark, and have appeared in 28 NCAA tournaments. Former player Austin Carr holds the record for most points scored in a single game of the tournament with 61. Although the team has never won the NCAA Tournament, they were named by the Helms Athletic Foundation as national champions twice. The team has orchestrated a number of upsets of number one ranked teams, the most notable of which was ending UCLA's record 88-game winning streak in 1974. The team has beaten an additional eight number-one teams, and those nine wins rank second, to UCLA's 10, all-time in wins against the top team. The team plays in newly renovated Purcell Pavilion (within the Edmund P. Joyce Center), which reopened for the beginning of the 2009–2010 season. The team is coached by Mike Brey, who, as of the 2014–15 season, his fifteenth at Notre

In [19]:
# 获得问题和文本的token长度

len(tokenizer(example["question"], example["context"])["input_ids"])

396

#### 截断上下文不保留超出部分

truncation 参数的选项

True 或 'longest_first': 这是默认选项。当输入长度超过最大长度限制时，会从最长的输入序列开始截断，直到总长度符合要求。如果有多个序列（例如，在文本对任务中），则首先截断最长的序列，如果需要，再截断第二长的序列，依此类推。

'only_first': 当处理一对序列时（例如，在问答任务或文本对比任务中），这个选项仅截断第一个序列（通常是问题或假设），而保留第二个序列（通常是上下文或前提）的完整性。

'only_second': 与'only_first'相反，这个选项仅截断第二个序列，保留第一个序列的完整性。在某些问答任务中，这可能有助于确保问题的完整性。

False: 不进行任何截断。如果输入序列超过了模型的最大长度限制，将会抛出错误。这个选项适用于确保输入数据完全符合模型要求的场景。

In [25]:
# truncation截断
token = tokenizer(example["question"],example["context"],
              max_length = max_length,
              truncation = True)
print(len(token['input_ids']))
tokenizer.decode(token['input_ids'])

384


"[CLS] how many wins does the notre dame men's basketball team have? [SEP] the men's basketball team has over 1, 600 wins, one of only 12 schools who have reached that mark, and have appeared in 28 ncaa tournaments. former player austin carr holds the record for most points scored in a single game of the tournament with 61. although the team has never won the ncaa tournament, they were named by the helms athletic foundation as national champions twice. the team has orchestrated a number of upsets of number one ranked teams, the most notable of which was ending ucla's record 88 - game winning streak in 1974. the team has beaten an additional eight number - one teams, and those nine wins rank second, to ucla's 10, all - time in wins against the top team. the team plays in newly renovated purcell pavilion ( within the edmund p. joyce center ), which reopened for the beginning of the 2009 – 2010 season. the team is coached by mike brey, who, as of the 2014 – 15 season, his fifteenth at not

In [32]:
# truncation截断
# only_first 当处理一对序列时（例如，在问答任务或文本对比任务中），这个选项仅截断第一个序列（通常是问题或假设），而保留第二个序列（通常是上下文或前提）的完整性。
token = tokenizer(example["question"],example["context"],
              max_length = max_length,
              truncation = "only_first")
print(len(token['input_ids']))
tokenizer.decode(token['input_ids'])

384


"[CLS] how many [SEP] the men's basketball team has over 1, 600 wins, one of only 12 schools who have reached that mark, and have appeared in 28 ncaa tournaments. former player austin carr holds the record for most points scored in a single game of the tournament with 61. although the team has never won the ncaa tournament, they were named by the helms athletic foundation as national champions twice. the team has orchestrated a number of upsets of number one ranked teams, the most notable of which was ending ucla's record 88 - game winning streak in 1974. the team has beaten an additional eight number - one teams, and those nine wins rank second, to ucla's 10, all - time in wins against the top team. the team plays in newly renovated purcell pavilion ( within the edmund p. joyce center ), which reopened for the beginning of the 2009 – 2010 season. the team is coached by mike brey, who, as of the 2014 – 15 season, his fifteenth at notre dame, has achieved a 332 - 165 record. in 2009 the

In [33]:
# truncation截断
# 'only_second': 与'only_first'相反，这个选项仅截断第二个序列，保留第一个序列的完整性。在某些问答任务中，这可能有助于确保问题的完整性。

token = tokenizer(example["question"],example["context"],
              max_length = max_length,
              truncation = "only_second")
print(len(token['input_ids']))
tokenizer.decode(token['input_ids'])


384


"[CLS] how many wins does the notre dame men's basketball team have? [SEP] the men's basketball team has over 1, 600 wins, one of only 12 schools who have reached that mark, and have appeared in 28 ncaa tournaments. former player austin carr holds the record for most points scored in a single game of the tournament with 61. although the team has never won the ncaa tournament, they were named by the helms athletic foundation as national champions twice. the team has orchestrated a number of upsets of number one ranked teams, the most notable of which was ending ucla's record 88 - game winning streak in 1974. the team has beaten an additional eight number - one teams, and those nine wins rank second, to ucla's 10, all - time in wins against the top team. the team plays in newly renovated purcell pavilion ( within the edmund p. joyce center ), which reopened for the beginning of the 2009 – 2010 season. the team is coached by mike brey, who, as of the 2014 – 15 season, his fifteenth at not

#### 关于截断的策略

- 直接截断超出部分: 当 truncation=`only_second` 时，
- 仅截断上下文（context），保留问题（question）：`return_overflowing_tokens=True` 和设置`stride`时 stride为截断后要补偿的
- 当你设置 return_overflowing_tokens=True 时，分词器会返回一个额外的字段，通常称为 overflowing_tokens 或类似名称。

In [34]:
tokenized_example = tokenizer(
    example["question"],
    example["context"],
    max_length=max_length,
    truncation="only_second",
    return_overflowing_tokens=True,
    stride=doc_stride
)

In [37]:
len(tokenized_example["input_ids"])

2

使用此策略截断后，Tokenizer 将返回多个 `input_ids` 列表。

In [38]:
[len(x) for x in tokenized_example["input_ids"]]

[384, 157]

解码两个输入特征，可以看到重叠的部分：

In [39]:
for x in tokenized_example["input_ids"][:2]:
    print(tokenizer.decode(x))

[CLS] how many wins does the notre dame men's basketball team have? [SEP] the men's basketball team has over 1, 600 wins, one of only 12 schools who have reached that mark, and have appeared in 28 ncaa tournaments. former player austin carr holds the record for most points scored in a single game of the tournament with 61. although the team has never won the ncaa tournament, they were named by the helms athletic foundation as national champions twice. the team has orchestrated a number of upsets of number one ranked teams, the most notable of which was ending ucla's record 88 - game winning streak in 1974. the team has beaten an additional eight number - one teams, and those nine wins rank second, to ucla's 10, all - time in wins against the top team. the team plays in newly renovated purcell pavilion ( within the edmund p. joyce center ), which reopened for the beginning of the 2009 – 2010 season. the team is coached by mike brey, who, as of the 2014 – 15 season, his fifteenth at notr

#### 使用 offsets_mapping 获取原始的 input_ids

设置 `return_offsets_mapping=True`，将使得截断分割生成的多个 input_ids 列表中的 token，通过映射保留原始文本的 input_ids。

当 return_offsets_mapping=True 时，分词器会为每个令牌返回一个元组，表示该令牌在原始未分词文本中的字符级偏移量。这个元组的形式通常是 (start, end)，

其中 start 是令牌在原文中的开始位置，end 是结束位置（不包括该位置）。这里的偏移指的是 字母级别的偏移

如下所示：第一个标记（[CLS]）的起始和结束字符都是（0, 0），因为它不对应问题/答案的任何部分，然后第二个标记与问题(question)的字符0到3相同.



In [82]:
tokenized_example = tokenizer(
    example["question"],
    example["context"],
    max_length=max_length,
    truncation="only_second",
    return_overflowing_tokens=True,
    return_offsets_mapping=True,
    stride=doc_stride
)

In [68]:
tokenized_example.keys()

dict_keys(['input_ids', 'attention_mask', 'offset_mapping', 'overflow_to_sample_mapping'])

In [83]:

start, end = tokenized_example['offset_mapping'][0][1]
example["question"][start:end]

'How'

In [84]:
first_token_id = tokenized_example["input_ids"][0][1]
offsets = tokenized_example["offset_mapping"][0][1]
print(tokenizer.convert_ids_to_tokens([first_token_id])[0], example["question"][offsets[0]:offsets[1]])

how How


In [85]:
second_token_id = tokenized_example["input_ids"][0][2]
offsets = tokenized_example["offset_mapping"][0][2]
print(tokenizer.convert_ids_to_tokens([second_token_id])[0], example["question"][offsets[0]:offsets[1]])

many many


#### convert_ids_to_tokens 和 decoder 区别：
#### convert_ids_to_tokens：可以是token序列号
#### decoder：是在整个字符串级别上进行的 不能多个


In [86]:
example["question"]

"How many wins does the Notre Dame men's basketball team have?"