# SpaCy
spaCy is another popular library for Natural Language Processing (NLP) that is designed for production use. It offers a wide range of functionalities for working with text, tokens, entities, and more. Below is a list of key functions and features inside the **spaCy** library:

### 1. **Core Functions in spaCy**:
   - **`spacy.load()`**: Loads a spaCy model.
     ```python
     nlp = spacy.load("en_core_web_sm")
     ```
   - **`nlp()`**: Processes text with a loaded spaCy model.
     ```python
     doc = nlp("This is a sentence.")
     ```

### 2. **Tokenization and Text Processing**:
   - **`Doc`**: The core data structure, representing a processed document.
   - **`Token`**: Represents individual tokens in a document.
     ```python
     for token in doc:
         print(token.text, token.pos_, token.dep_)
     ```
   - **`Span`**: A slice of a `Doc` (multiple tokens).
   - **`.sents`**: Accesses the sentences in a document.
     ```python
     for sent in doc.sents:
         print(sent.text)
     ```
   - **`.ents`**: Extracts named entities.
     ```python
     for ent in doc.ents:
         print(ent.text, ent.label_)
     ```

### 3. **Linguistic Features**:
   - **`.lemma_`**: Lemmatizes a token.
   - **`.pos_`**: Part-of-speech tagging of a token.
   - **`.dep_`**: Dependency parsing (grammatical relations).
   - **`.is_alpha`**, **`.is_stop`**: Checks if a token is alphabetic or a stopword.
   - **`.shape_`**: Returns the shape of the token (e.g., capitalized, etc.).

### 4. **Named Entity Recognition (NER)**:
   - **`.ents`**: Extracts named entities.
     ```python
     for ent in doc.ents:
         print(ent.text, ent.label_)
     ```
   - **`spacy.explain()`**: Provides explanations of labels (POS, NER, etc.).
     ```python
     spacy.explain("ORG")
     ```

### 5. **Text Classification**:
   - **`.cats`**: Used to access the document categories.
   - **`.textcat`**: Built-in text classifier (in spaCy v3.0 and later).

### 6. **Vector Representations**:
   - **`.vector`**: Retrieves word vectors (embeddings) for tokens and documents.
   - **`.similarity()`**: Computes similarity between documents or tokens based on word vectors.
     ```python
     similarity_score = doc1.similarity(doc2)
     ```

### 7. **Pipeline Components**:
   - **`nlp.pipe()`**: Efficiently processes a batch of texts.
     ```python
     for doc in nlp.pipe(["Sentence 1", "Sentence 2"]):
         print(doc)
     ```
   - **`nlp.add_pipe()`**: Adds a custom component to the pipeline.
   - **`nlp.remove_pipe()`**: Removes a component from the pipeline.
   - **`nlp.get_pipe()`**: Retrieves a specific pipeline component.
   - **`nlp.pipeline`**: Lists all the components in the pipeline.
     ```python
     print(nlp.pipeline)
     ```

### 8. **Pattern Matching**:
   - **`Matcher`**: A rule-based token matcher.
     ```python
     from spacy.matcher import Matcher
     matcher = Matcher(nlp.vocab)
     ```
   - **`PhraseMatcher`**: Matches exact sequences of tokens.
   - **`DependencyMatcher`**: Matches based on the syntactic dependency structure.

### 9. **Visualization**:
   - **`displacy.render()`**: Visualizes the dependency tree or named entities in a document.
     ```python
     from spacy import displacy
     displacy.render(doc, style="dep")
     ```
   - **`displacy.serve()`**: Serves visualizations via a web browser.
     ```python
     displacy.serve(doc, style="dep")
     ```

### 10. **Training and Fine-tuning**:
   - **`spacy.training`**: Tools for model training and evaluation.
   - **`spacy.train()`**: Training models from scratch.
   - **`spacy.evaluate()`**: Evaluating a trained model.

### 11. **Language Models**:
   - **`Language`**: The core class for NLP pipelines.
   - **`Vocab`**: Stores the vocabulary of a language.
   - **`Tokenizer`**: Handles tokenization.
   - **`Tagger`**, **`Parser`**, **`EntityRecognizer`**: Various pipeline components.

### 12. **Data Management**:
   - **`spacy.blank()`**: Creates a blank language model for custom NLP tasks.
     ```python
     nlp = spacy.blank("en")
     ```
   - **`spacy.read_vectors()`**: Reads word vectors.
   - **`DocBin`**: Serializes `Doc` objects for storage.

### 13. **Word Vectors & Embeddings**:
   - **`.vector`**: Retrieves word vectors for tokens and docs.
     ```python
     doc[0].vector
     ```
   - **`.similarity()`**: Computes similarity between documents, spans, or tokens based on vectors.
     ```python
     doc1.similarity(doc2)
     ```

### 14. **Custom Components & Extensions**:
   - **`@Language.component()`**: Allows you to create custom pipeline components.
     ```python
     @Language.component("custom_component")
     def custom_component(doc):
         return doc
     ```
   - **`.set_extension()`**: Creates custom attributes for `Doc`, `Span`, or `Token`.
     ```python
     Token.set_extension("is_custom", default=False)
     ```

### 15. **Documentation and Model Loading**:
   - **`spacy.info()`**: Provides information about the installed spaCy models and the environment.
   - **`spacy.util`**: Contains various utility functions for handling files, I/O, and more.
   - **`spacy.load()`**: Loads a pre-trained model (as mentioned earlier).

### 16. **Evaluation Metrics**:
   - **`spacy.scorer`**: Tools for calculating precision, recall, and F-scores for tasks like Named Entity Recognition (NER).

### 17. **Additional Tools and Utilities**:
   - **`spacy.gold`**: Handles annotated gold-standard data.
   - **`spacy.about`**: Provides information about the spaCy version and authors.
   - **`spacy.cli`**: Command-line tools for managing spaCy resources.

---

### To explore more functions and details:
- **Use the `help()` function**:
  ```python
  import spacy
  help(spacy)
  ```

- **Check the official spaCy documentation**:
  Visit the [spaCy documentation](https://spacy.io/api) for a detailed API reference and function list.

The exact list of functions will vary based on your spaCy version and any extensions or custom components you may add, but this covers most of the core functionalities provided by spaCy.
