In [12]:
import pandas as pd
import numpy as np

## `pandas.Series`

```python
class pandas.Series(
    data=None, index=None, dtype=None, name=None, copy=None, fastpath=<no_default>
)[source]
```

One-dimensional `ndarray` with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from `ndarray` have been overridden to automatically exclude missing data (currently represented as `NaN`).

Operations between Series (`+`, `-`, `/`, `*`, `**`) align values based on their associated index values—they need not be the same length. The result index will be the sorted union of the two indexes.

### Parameters:

- **data**: `array-like`, `Iterable`, `dict`, or scalar value  
  Contains data stored in Series. If `data` is a `dict`, argument order is maintained.

- **index**: `array-like` or `Index` (1d)  
  Values must be hashable and have the same length as `data`. Non-unique index values are allowed. Will default to `RangeIndex` (0, 1, 2, …, n) if not provided. If `data` is dict-like and `index` is None, then the keys in the `data` are used as the `index`. If the `index` is not None, the resulting Series is reindexed with the index values.

- **dtype**: `str`, `numpy.dtype`, or `ExtensionDtype`, optional  
  Data type for the output Series. If not specified, this will be inferred from `data`. See the user guide for more usages.

- **name**: `Hashable`, default `None`  
  The name to give to the Series.

- **copy**: `bool`, default `False`  
  Copy input data. Only affects `Series` or 1d ndarray input. See examples.

In [13]:
# From an array
data_array = [1, 2, 3, 4]
s1 = pd.Series(data_array)
print(s1)

# From a dictionary
data_dict = {'a': 1, 'b': 2, 'c': 3}
s2 = pd.Series(data_dict)
print(s2)


0    1
1    2
2    3
3    4
dtype: int64
a    1
b    2
c    3
dtype: int64


In [14]:
# From an array with custom index
s3 = pd.Series(data_array, index=['A', 'B', 'C', 'D'])
print(s3)


A    1
B    2
C    3
D    4
dtype: int64


In [16]:
s4 = pd.Series(data_array, dtype='float64', name='sample_series')
print(s4)


0    1.0
1    2.0
2    3.0
3    4.0
Name: sample_series, dtype: float64


**pandas.DataFrame class**

`pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)[source]`

Two-dimensional, size-mutable, potentially heterogeneous tabular data. Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

**Parameters:**

- **data**: ndarray (structured or homogeneous), Iterable, dict, or DataFrame. Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order. If a dict contains Series which have an index defined, it is aligned by its index. This alignment also occurs if data is a Series or a DataFrame itself. Alignment is done on Series/DataFrame inputs. If data is a list of dicts, column order follows insertion-order.
- **index**: Index or array-like. Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
- **columns**: Index or array-like. Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
- **dtype**: dtype, default None. Data type to force. Only a single dtype is allowed. If None, infer.
- **copy**: bool or None, default None. Copy data from inputs. For dict data, the default of None behaves like copy=True. For DataFrame or 2d ndarray input, the default of None behaves like copy=False. If data is a dict containing one or more Series (possibly of different dtypes), copy=False will ensure that these inputs are not copied.

In [9]:
d={'col1':[1,2,3],'col2':[4,5,6]}
pd.DataFrame(data=d)

Unnamed: 0,col1,col2
0,1,4
1,2,5
2,3,6


In [11]:
f={'col1':[1,2,3,4,5,6,7,7,7],'col2':[4,5,6,7,8,9,10,11,12],'col3':pd.Series([1,2,3],index=[0,1,2])}
pd.DataFrame(data=f, index=[1,2,3,4,5,6,7,8,9])

Unnamed: 0,col1,col2,col3
1,1,4,2.0
2,2,5,3.0
3,3,6,
4,4,7,
5,5,8,
6,6,9,
7,7,10,
8,7,11,
9,7,12,


## `pandas.read_csv`

```python
pandas.read_csv(
    filepath_or_buffer,
    *,
    sep=<no_default>,
    delimiter=None,
    header='infer',
    names=<no_default>,
    index_col=None,
    usecols=None,
    dtype=None,
    engine=None,
    converters=None,
    true_values=None,
    false_values=None,
    skipinitialspace=False,
    skiprows=None,
    skipfooter=0,
    nrows=None,
    na_values=None,
    keep_default_na=True,
    na_filter=True,
    verbose=<no_default>,
    skip_blank_lines=True,
    parse_dates=None,
    infer_datetime_format=<no_default>,
    keep_date_col=<no_default>,
    date_parser=<no_default>,
    date_format=None,
    dayfirst=False,
    cache_dates=True,
    iterator=False,
    chunksize=None,
    compression='infer',
    thousands=None,
    decimal='.',
    lineterminator=None,
    quotechar='"',
    quoting=0,
    doublequote=True,
    escapechar=None,
    comment=None,
    encoding=None,
    encoding_errors='strict',
    dialect=None,
    on_bad_lines='error',
    delim_whitespace=<no_default>,
    low_memory=True,
    memory_map=False,
    float_precision=None,
    storage_options=None,
    dtype_backend=<no_default>
)[source]
```

Read a comma-separated values (CSV) file into a DataFrame.

Also supports optionally iterating or breaking the file into chunks.

Additional help can be found in the online docs for IO Tools.

### Parameters:

- **filepath_or_buffer**: `str`, path object or file-like object  
  Any valid string path is acceptable. The string could be a URL. Valid URL schemes include `http`, `ftp`, `s3`, `gs`, and `file`. For file URLs, a host is expected. A local file could be: `file://localhost/path/to/table.csv`.

  If you want to pass in a path object, pandas accepts any `os.PathLike`.

  By file-like object, we refer to objects with a read() method, such as a file handle (e.g., via built-in open function) or `StringIO`.

- **sep**: `str`, default `‘,’`  
  Character or regex pattern to treat as the delimiter. If `sep=None`, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator from only the first valid row of the file by Python’s built-in sniffer tool, `csv.Sniffer`. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: `'\r\t'`.

- **delimiter**: `str`, optional  
  Alias for `sep`.

- **header**: `int`, `Sequence of int`, `‘infer’` or `None`, default `‘infer’`  
  Row number(s) containing column labels and marking the start of the data (zero-indexed). Default behavior is to infer the column names: if no names are passed, the behavior is identical to `header=0` and column names are inferred from the first line of the file. If column names are passed explicitly to `names`, then the behavior is identical to `header=None`. Explicitly pass `header=0` to be able to replace existing names. The header can be a list of integers that specify row locations for a MultiIndex on the columns, e.g., `[0, 1, 3]`. Intervening rows that are not specified will be skipped (e.g., 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if `skip_blank_lines=True`, so `header=0` denotes the first line of data rather than the first line of the file.

- **names**: `Sequence of Hashable`, optional  
  Sequence of column labels to apply. If the file contains a header row, then you should explicitly pass `header=0` to override the column names. Duplicates in this list are not allowed.

- **index_col**: `Hashable`, `Sequence of Hashable` or `False`, optional  
  Column(s) to use as row label(s), denoted either by column labels or column indices. If a sequence of labels or indices is given, `MultiIndex` will be formed for the row labels.

  Note: `index_col=False` can be used to force `pandas` to not use the first column as the index, e.g., when you have a malformed file with delimiters at the end of each line.

- **usecols**: `Sequence of Hashable` or `Callable`, optional  
  Subset of columns to select, denoted either by column labels or column indices. If list-like, all elements must either be positional (i.e., integer indices into the document columns) or strings that correspond to column names provided either by the user in `names` or inferred from the document header row(s). If `names` are given, the document header row(s) are not taken into account. For example, a valid list-like `usecols` parameter would be `[0, 1, 2]` or `['foo', 'bar', 'baz']`. Element order is ignored, so `usecols=[0, 1]` is the same as `[1, 0]`. To instantiate a DataFrame from data with element order preserved use `pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]` for columns in `['foo', 'bar']` order or `pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]` for `['bar', 'foo']` order.

  If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be `lambda x: x.upper() in ['AAA', 'BBB', 'DDD']`. Using this parameter results in much faster parsing time and lower memory usage.

- **dtype**: `dtype` or `dict of {Hashable: dtype}`, optional  
  Data type(s) to apply to either the whole dataset or individual columns, e.g., `{'a': np.float64, 'b': np.int32, 'c': 'Int64'}`. Use `str` or `object` together with suitable `na_values` settings to preserve and not interpret dtype. If converters are specified, they will be applied instead of dtype conversion.

  Added in version 1.5.0: Support for `defaultdict` was added. Specify a `defaultdict` as input where the default determines the dtype of the columns that are not explicitly listed.

- **engine**: {‘c’, ‘python’, ‘pyarrow’}, optional  
  Parser engine to use. The `C` and `pyarrow` engines are faster, while the `python` engine is currently more feature-complete. Multithreading is currently only supported by the `pyarrow` engine.

  Added in version 1.4.0: The `‘pyarrow’` engine was added as an experimental engine, and some features are unsupported or may not work correctly with this engine.

- **converters**: `dict of {Hashable: Callable}`, optional  
  Functions for converting values in specified columns. Keys can either be column labels or column indices.

- **true_values**: `list`, optional  
  Values to consider as True in addition to case-insensitive variants of `‘True’`.

- **false_values**: `list`, optional  
  Values to consider as False in addition to case-insensitive variants of `‘False’`.

- **skipinitialspace**: `bool`, default `False`  
  Skip spaces after delimiter.

- **skiprows**: `int`, list of int or `Callable`, optional  
  Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

  If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be `lambda x: x in [0, 2]`.

- **skipfooter**: `int`, default `0`  
  Number of lines at the bottom of the file to skip (Unsupported with `engine='c'`).

- **nrows**: `int`, optional  
  Number of rows of file to read. Useful for reading pieces of large files.

- **na_values**: `Hashable`, `Iterable of Hashable` or `dict of {Hashable: Iterable}`, optional  
  Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default, the following values are interpreted as NaN: “ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”, “1.#IND”, “1.#QNAN”, `<NA>`, “N/A”, “NA”, “NULL”, “NaN”, “None”, “n/a”, “nan”, “null”.

- **keep_default_na**: `bool`, default `True`  
  Whether or not to include the default NaN values when parsing the data. Depending on whether `na_values` is passed in, the behavior is as follows:

  - If `keep_default_na` is True, and `na_values` are specified, `na_values` is appended to the default NaN values used for parsing.
  - If `keep_default_na` is True, and `na_values` are not specified, only the default NaN values are used for parsing.
  - If `keep_default_na` is False, and `na_values` are specified, only the NaN values specified in `na_values` are used for parsing.
  - If `keep_default_na` is False, and `na_values` are not specified, no strings will be parsed as NaN.

  Note that if `na_filter` is passed in as False, the `keep_default_na` and `na_values` parameters will be ignored.

- **na_filter**: `bool`, default `True`  
  Detect missing value markers (empty strings and the value of `na_values`). In data without any NA values, passing `na_filter=False` can improve the performance of reading a large file.

- **verbose**: `bool`, default `False`  
  Indicate the number of NA values placed in non-numeric columns.

  Deprecated since version 2.2.0.

- **skip_blank_lines**: `bool`, default `True`  
  If True, skip over blank lines rather than interpreting them as NaN values.

- **parse_dates**: `bool`, list of `Hashable`, list of lists or `dict of {Hashable: list}`, default `False`  
  The behavior is as follows:
    - `bool`. If True -> try parsing the index. Note: Automatically set to True if `date_format` or `date_parser` arguments have been passed.
    - list of int or names. e.g., If `[1, 2, 3]` -> try parsing columns 1, 2, 3 each as a separate date column.
    - list of list. e.g., If `[[1, 3]]` -> combine columns 1 and 3 and parse as a single date column. Values are joined with a space before parsing.
    - dict, e.g., `{'foo': [1, 3]}` -> parse columns 1, 3 as date and call result ‘foo’. Values are joined with a space before parsing.

  If a column or index cannot be represented as an array of datetime, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use `to_datetime()` after `read_csv()`.

  Note: A fast-path exists for ISO 8601-formatted dates.

- **infer_datetime_format**: `bool`, default `False`  
  If True and `parse_dates` is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases, this can increase the parsing speed by 5-10x.

- **keep_date_col**: `bool`, default `False`  
  If True and `parse_dates` specifies combining multiple columns then keep the original columns.

- **date_parser**: `function`, optional  
  Function to use for converting a sequence of string columns to an array of datetime instances. The default uses `dateutil.parser.parser` to do the conversion. Pandas will try to call `date_parser` in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays as arguments; 2) concatenate (row-wise) the string values from the columns being parsed as arguments; 3) call `date_parser` once for each row using one or more strings from the columns being parsed.

- **date_format**: `str`, default `None`  
  The format to use for parsing dates. The default behavior is to infer the column format. For more information see `Parsing a CSV with mixed Timezones`.

- **dayfirst**: `bool`, default `False`  
  DD/MM format dates, international and European format.

- **cache_dates**: `bool`, default `True`  
  If True, attempt to cache dates to speed up parsing.

- **iterator**: `bool`, default `False`  
  Return TextFileReader object for iteration or getting chunks with `get_chunk()`.

- **chunksize**: `int`, optional  
  Return TextFileReader object for iteration. See the IO Tools docs for more information on `iterator` and `chunksize`.

- **compression**: `str` or `dict`, default `'infer'`  
  For on-the-fly decompression of on-disk data. If 'infer', then detect compression from the following extensions: '.gz', '.bz2', '.zip', '.xz', or '.zst' (otherwise no decompression). If using `'zip'` or `'tar'`, the ZIP file must contain only one data file to be read in. Set to `None` for no decompression. Can also be a dict with the key 'method' set to one of {‘zip’, ‘gzip’, ‘bz2’, ‘zstd’, ‘infer’} and other key-value pairs are forwarded to the appropriate compression instance. The support for `tar` files was removed in version 1.2.0.

- **thousands**: `str`, optional   
  Thousands separator.

- **decimal**: `str`, default `.`  
  Character to recognize as decimal point (e.g., use ‘,’ for European data).

- **lineterminator**: `str`, optional  
  Character to break file into lines. Only valid with `C` parser.

- **quotechar**: `str`, default `"`  
  Character to recognize as the quoting character.

- **quoting**: `int`, default `0`  
  Controls when quotes should be recognized. Acceptable values are `0` (QUOTE_MINIMAL), `1` (QUOTE_ALL), `2` (QUOTE_NONNUMERIC), and `3` (QUOTE_NONE). Default is QUOTE_MINIMAL. See `csv.QUOTE_*` for more information on quoting constants.

- **doublequote**: `bool`, default `True`  
  When quotechar is specified and `quoting` is not `csv.QUOTE_NONE`, indicate whether or not to interpret two consecutive quotechar as one.

- **escapechar**: `str`, optional  
  One-character string used to escape other characters.

- **comment**: `str`, optional  
  Indicates the line should not be parsed. If found at the beginning of a line, the line will be skipped (e.g., `comment='#'` will skip lines starting with `#`).

- **encoding**: `str`, optional  
  Encoding to use for UTF when reading/writing (`ex. ‘utf-8’`). List of Python standard encodings.

- **encoding_errors**: `str`, default `'strict'`  
  How encoding errors are treated. Set to `'ignore'` for skipping errors and `'backslashreplace'` for escaping invalid UTF-8 characters, and `'strict'` for raising exceptions. Other error handling values are implemented - see `False Multibyte Character Error` for more information.

- **dialect**: `str` or `csv.Dialect`, optional  
  If provided, this parameter will override values (default or not) for the following parameters: `delimiter`, `doublequote`, `escapechar`, `skipinitialspace`, `quotechar`, and `quoting`. See `csv.Dialect` documentation for more information.

- **on_bad_lines**: `str` or `callable`, default `'error'`   
  Specifies how to handle bad lines (i.e., lines with too many fields). Allowed values are `'error'`, `'skip'`, and `callable`. See issues 38630 and 37852. The callable should expect a string of the bad line and an integer of the line number and return either `None` or a cleaned-up version of the line. Using `on_bad_lines='skip'` will skip bad lines rather than raising an error and is equivalent to `pd.io.common.get_handle(..., bad_lines='skip')`.

- **delim_whitespace**: `bool`, optional  
  Specifies whether or not whitespace (e.g., ' ' or '\\t') will be used as the delimiter. Delimiters longer than one character and different from '\s+' will be interpreted as regular expressions and will also force the use of the `Python` parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: `'\r\t'`.

- **low_memory**: `bool`, default `True`  
  Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set `False`, or specify the column types manually via the `dtype` parameter. Note that the type of the output and the `C` engine behavior may change with requests transition from `pandas 0.24` to `pandas 1.0`. This is expected in order to provide better support for the wide array of invalid `csv` uses, especially in on-disk data.

- **memory_map**: `bool`, default `False`  
  If `True`, passed to `open` in `Python 3` for memory mapping; this can result in much faster parsing time for large files.

- **float_precision**: `str`, optional  
  Specifying floating-point precision. This can improve accuracy at the cost of memory. The allowed values are `'round_trip'` (default), `'high'` or `'legacy'`.

- **storage_options**: `dict`, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.  

- **dtype_backend**: `str`, default `‘numpy’`  
  Backend hardware configuration to be used when converting the data loaded. The dtype of arrays is detected and converted to the corresponding hardware using this backend. The available options are `'numpy'`, `'tensorflow'`, and `'pytorch'`.

## pandas.read_excel

`pandas.read_excel(io, sheet_name=0, *, header=0, names=None, index_col=None, usecols=None, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, parse_dates=False, date_parser=<no_default>, date_format=None, thousands=None, decimal='.', comment=None, skipfooter=0, storage_options=None, dtype_backend=<no_default>, engine_kwargs=None)[source]`

Read an Excel file into a pandas DataFrame.

Supports xls, xlsx, xlsm, xlsb, odf, ods and odt file extensions read from a local filesystem or URL. Supports an option to read a single sheet or a list of sheets.

### Parameters:

- **io**: str, bytes, ExcelFile, xlrd.Book, path object, or file-like object  
  Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: `file://localhost/path/to/table.xlsx`.  
  By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.  
  Deprecated since version 2.1.0: Passing byte strings is deprecated. To read from a byte string, wrap it in a BytesIO object.

- **sheet_name**: str, int, list, or None, default 0  
  Strings are used for sheet names. Integers are used in zero-indexed sheet positions (chart sheets do not count as a sheet position). Lists of strings/integers are used to request multiple sheets. Specify None to get all worksheets.

  Available cases:
  - Defaults to 0: 1st sheet as a DataFrame
  - 1: 2nd sheet as a DataFrame
  - "Sheet1": Load sheet with name “Sheet1”
  - [0, 1, "Sheet5"]: Load first, second and sheet named “Sheet5” as a dict of DataFrame
  - None: All worksheets.

- **header**: int, list of int, default 0  
  Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

- **names**: array-like, default None  
  List of column names to use. If file contains no header row, then you should explicitly pass `header=None`.

- **index_col**: int, str, list of int, default None  
  Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column. If a list is passed, those columns will be combined into a MultiIndex. If a subset of data is selected with `usecols`, `index_col` is based on the subset.

  Missing values will be forward filled to allow roundtripping with `to_excel` for `merged_cells=True`. To avoid forward filling the missing values use `set_index` after reading the data instead of `index_col`.

- **usecols**: str, list-like, or callable, default None  
  If None, then parse all columns.

  If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.

  If a list of int, then indicates list of column numbers to be parsed (0-indexed).

  If a list of string, then indicates list of column names to be parsed.

  If callable, then evaluate each column name against it and parse the column if the callable returns True.

  Returns a subset of the columns according to behavior above.

- **dtype**: Type name or dict of column -> type, default None  
  Data type for data or columns. E.g. `{'a': np.float64, 'b': np.int32}`. Use object to preserve data as stored in Excel and not interpret dtype, which will necessarily result in object dtype. If converters are specified, they will be applied INSTEAD of dtype conversion. If you use None, it will infer the dtype of each column based on the data.

- **engine**: {'openpyxl', 'calamine', 'odf', 'pyxlsb', 'xlrd'}, default None  
  If io is not a buffer or path, this must be set to identify io. Engine compatibility:
  - `openpyxl` supports newer Excel file formats.
  - `calamine` supports Excel (.xls, .xlsx, .xlsm, .xlsb) and OpenDocument (.ods) file formats.
  - `odf` supports OpenDocument file formats (.odf, .ods, .odt).
  - `pyxlsb` supports Binary Excel files.
  - `xlrd` supports old-style Excel files (.xls).

  When engine=None, the following logic will be used to determine the engine:
  - If `path_or_buffer` is an OpenDocument format (.odf, .ods, .odt), then `odf` will be used.
  - Otherwise if `path_or_buffer` is an xls format, `xlrd` will be used.
  - Otherwise if `path_or_buffer` is in xlsb format, `pyxlsb` will be used.
  - Otherwise `openpyxl` will be used.

- **converters**: dict, default None  
  Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the Excel cell content, and return the transformed content.

- **true_values**: list, default None  
  Values to consider as True.

- **false_values**: list, default None  
  Values to consider as False.

- **skiprows**: list-like, int, or callable, optional  
  Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be `lambda x: x in [0, 2]`.

- **nrows**: int, default None  
  Number of rows to parse.

- **na_values**: scalar, str, list-like, or dict, default None  
  Additional strings to recognize as NA/NaN. If dict is passed, specific per-column NA values. By default the following values are interpreted as NaN: `‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘None’, ‘n/a’, ‘nan’, ‘null’`.

- **keep_default_na**: bool, default True  
  Whether or not to include the default NaN values when parsing the data. Depending on whether `na_values` is passed in, the behavior is as follows:
  - If `keep_default_na` is True, and `na_values` are specified, `na_values` is appended to the default NaN values used for parsing.
  - If `keep_default_na` is True, and `na_values` are not specified, only the default NaN values are used for parsing.
  - If `keep_default_na` is False, and `na_values` are specified, only the NaN values specified `na_values` are used for parsing.
  - If `keep_default_na` is False, and `na_values` are not specified, no strings will be parsed as NaN.

  Note that if `na_filter` is passed in as False, the `keep_default_na` and `na_values` parameters will be ignored.

- **na_filter**: bool, default True  
  Detect missing value markers (empty strings and the value of `na_values`). In data without any NAs, passing `na_filter=False` can improve the performance of reading a large file.

- **verbose**: bool, default False  
  Indicate the number of NA values placed in non-numeric columns.

- **parse_dates**: bool, list-like, or dict, default False  
  The behavior is as follows:
  - `bool`. If True -> try parsing the index.
  - list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
  - list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
  - dict, e.g. {'foo': [1, 3]} -> parse columns 1, 3 as date and call result `foo`.

  If a column or index contains an unparsable date, the entire column or index will be returned unaltered as an object data type. If you don't want to parse some cells as a date just change their type in Excel to “Text”. For non-standard datetime parsing, use `pd.to_datetime` after `pd.read_excel`.

  **Note**: A fast-path exists for iso8601-formatted dates.

- **date_parser**: function, optional  
  Function to use for converting a sequence of string columns to an array of datetime instances. The default uses `dateutil.parser.parser` to do the conversion. Pandas will try to call `date_parser` in three different ways, advancing to the next if an exception occurs:
  1. Pass one or more arrays (as defined by `parse_dates`) as arguments;
  2. concatenate (row-wise) the string values from the columns defined by `parse_dates` into a single array and pass that;
  3. call `date_parser` once for each row using one or more strings (corresponding to the columns defined by `parse_dates`) as arguments.

  Deprecated since version 2.0.0: Use `date_format` instead, or read in as object and then apply `to_datetime()` as needed.

- **date_format**: str or dict of column -> format, default None  
  If used in conjunction with `parse_dates`, will parse dates according to this format. For anything more complex, please read in as object and then apply `to_datetime()` as needed.  
  Added in version 2.0.0.

- **thousands**: str, default None  
  Thousands separator for parsing string columns to numeric. Note that this parameter is only necessary for columns stored as TEXT in Excel, any numeric columns will automatically be parsed, regardless of display format.

- **decimal**: str, default ‘.’  
  Character to recognize as the decimal point for parsing string columns to numeric. Note that this parameter is only necessary for columns stored as TEXT in Excel, any numeric columns will automatically be parsed, regardless of display format (e.g. use `,` for European data).  
  Added in version 1.4.0.

- **comment**: str, default None  
  Comments out the remainder of the line. Pass a character or characters to this argument to indicate comments in the input file. Any data between the comment string and the end of the current line is ignored.

- **skipfooter**: int, default 0  
  Rows at the end to skip (0-indexed).

- **storage_options**: dict, optional  
  Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to `urllib.request.Request` as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to `fsspec.open`. Please see `fsspec` and `urllib` for more details, and for more examples on storage options refer [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-excel).

- **dtype_backend**: {'numpy_nullable', 'pyarrow'}, default 'numpy_nullable'  
  Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:
  - `"numpy_nullable"`: returns nullable-dtype-backed DataFrame (default).
  - `"pyarrow"`: returns pyarrow-backed nullable `ArrowDtype` DataFrame.

### Example:
Here is a simple example of how to read an Excel file:

```python
import pandas as pd

df = pd.read_excel('path_to_file.xlsx', sheet_name='Sheet1')
print(df.head())
```

## pandas.read_sql

`pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None, dtype_backend=<no_default>, dtype=None)[source]`

Read SQL query or database table into a DataFrame.

This function is a convenience wrapper around `read_sql_table` and `read_sql_query` (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to `read_sql_query`, while a database table name will be routed to `read_sql_table`. Note that the delegated function might have more specific notes about their functionality not listed here.

### Parameters:

- **sql**: str or SQLAlchemy Selectable (select or text object)  
  SQL query to be executed or a table name.

- **con**: ADBC Connection, SQLAlchemy connectable, str, or sqlite3 connection  
  ADBC provides high performance I/O with native type support, where available. Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible for engine disposal and connection closure for the ADBC connection and SQLAlchemy connectable; str connections are closed automatically. See [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql).

- **index_col**: str or list of str, optional, default None  
  Column(s) to set as index (MultiIndex).

- **coerce_float**: bool, default True  
  Attempts to convert values of non-string, non-numeric objects (like `decimal.Decimal`) to floating point, useful for SQL result sets.

- **params**: list, tuple or dict, optional, default None  
  List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent. Check your database driver documentation for which of the five syntax styles, described in PEP 249’s `paramstyle`, is supported. E.g., for `psycopg2`, use `%(name)s` so use `params={'name': 'value'}`.

- **parse_dates**: list or dict, default None  
  List of column names to parse as dates.  
  Dict of `{column_name: format string}` where format string is `strftime` compatible in case of parsing string times, or is one of (`D`, `s`, `ns`, `ms`, `us`) in case of parsing integer timestamps.  
  Dict of `{column_name: arg dict}`, where the arg dict corresponds to the keyword arguments of `pandas.to_datetime()`. Especially useful with databases without native Datetime support, such as SQLite.

- **columns**: list, default None  
  List of column names to select from SQL table (only used when reading a table).

- **chunksize**: int, default None  
  If specified, return an iterator where `chunksize` is the number of rows to include in each chunk.

- **dtype_backend**: {'numpy_nullable', 'pyarrow'}, default 'numpy_nullable'  
  Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:
  - `"numpy_nullable"`: returns nullable-dtype-backed DataFrame (default).
  - `"pyarrow"`: returns pyarrow-backed nullable `ArrowDtype` DataFrame.  
  Added in version 2.0.

- **dtype**: Type name or dict of columns  
  Data type for data or columns. E.g. `np.float64` or `{'a': np.float64, 'b': np.int32, 'c': 'Int64'}`. The argument is ignored if a table is passed instead of a query.

### Example:
Here is a simple example of how to read from an SQL table:

```python
import pandas as pd
from sqlalchemy import create_engine

# Create an engine instance
engine = create_engine('sqlite:///example.db')

# Read data from SQL table
df = pd.read_sql('table_name', con=engine)
print(df.head())
```