# About Pandas

In [None]:
"""
Pandas is a popular open-source Python library for 
data manipulation and analysis.
It provides easy-to-use data structures 
and functions for working with structured data,
such as tables and time series. 
Here are some of the key components and 
features of the Pandas library:

1. Data Structures:

- DataFrame: 
A two-dimensional table with rows and columns, 
similar to a spreadsheet or SQL table.
- Series: 
A one-dimensional labeled array that 
can hold various data types.

2. Data Input and Output:

Pandas can read and write data from/to 
various file formats, including CSV,
Excel, SQL databases, JSON, and more.

3. Data Cleaning and Preprocessing:

You can handle missing data, duplicate rows, 
and perform various data cleaning operations.
It provides methods for filtering, 
sorting, and transforming data.

4. Data Selection and Indexing:

Pandas offers various ways to select 
and filter data, including label-based indexing,
integer-based indexing, and boolean indexing.

5. Aggregation and Grouping:

You can perform operations like grouping, 
aggregation, and pivot tables.
It supports various statistical and 
mathematical operations.

6. Time Series Analysis:

Pandas has extensive support for time series data,
making it useful for working with time-related data.

7. Merging and Joining Data:

You can combine data from different sources
using methods like merge and join.

8. Visualization:

While not a primary data visualization library,
Pandas can integrate with libraries like 
Matplotlib for basic data visualization.
"""

In [1]:
# To use Pandas, you first need to install it, typically using pip:
!pip install pandas




In [2]:
# Once installed, you can import it 
# into your Python script or Jupyter Notebook:
import pandas as pd

Pandas File read and Write operations

In [4]:
"""
Pandas Read Operations

1. read_csv():

    Parameters:
        - filepath_or_buffer: The file path or a URL to the CSV file.
        - sep: Delimiter used in the CSV file (default is ,).
        - header: Row(s) to use as the column names.
        - index_col: Column(s) to use as the row labels of the DataFrame.
        Many other options to control parsing and data interpretation.

2. read_excel():

    Parameters:
        - io: The file path or a URL to the Excel file.
        - sheet_name: Name or index of the sheet to read.
        - header: Row(s) to use as the column names.
        - index_col: Column(s) to use as the row labels of the DataFrame.
        Additional options for controlling Excel parsing and data interpretation.

3. read_sql():

    Parameters:
        - sql: SQL query to execute.
        - con: SQLAlchemy engine or database connection.
        - index_col: Column(s) to use as the row labels of the DataFrame.
        - params: Parameters to pass to the SQL query.
        Various other SQL-related options.

4. read_json():

    Parameters:
        - path_or_buf: The file path, URL, or JSON string.
        - orient: The orientation of the JSON data (e.g., 'split', 'records', 'index', 'columns').
        Various other options for JSON data interpretation.

5. read_html():

    Parameters:
        - io: The URL or HTML string to parse.
        - match: Match partial string to use for finding HTML tables.
        - header: Row(s) to use as the column names.
        Many other HTML parsing options.

6. read_clipboard():

    Parameters:
        - sep: Delimiter used to separate columns (default is one or more spaces).
        - header: Row(s) to use as the column names.

7. read_hdf():

    Parameters:
        - path_or_buf: The file path or buffer to read.
        - key: The group identifier in the HDF5 file.
        - columns: A list of columns to read.
        Various options for handling data and metadata.

8. read_feather():

    Parameters:
        - path: The file path to read.
        - columns: A list of columns to read.

9. read_parquet():

    Parameters:
        - path: The file path to read.
        - engine: The Parquet reader library to use ('pyarrow' or 'fastparquet').
        - columns: A list of columns to read.

10. read_msgpack():

    Parameters:
        - path_or_buf: The file path, URL, or binary stream.
        - encoding: The character encoding to use.

11. read_stata():

    Parameters:
        - filepath_or_buffer: The file path or buffer to read.
        - convert_categoricals: Whether to convert Stata categorical data to Pandas categoricals.

12. read_sas():

    Parameters:
        - filepath_or_buffer: The file path or buffer to read.
        - format: The SAS file format (e.g., 'xport', 'sas7bdat').

13. read_gbq():

    Parameters:
        - query: The SQL query to execute in Google BigQuery.
        - project_id: The Google Cloud project ID.
        Many other options for authentication and query configuration.

14. read_spss():

    Parameters:
        - path: The file path to read.
        - usecols: A list of columns to read.

15. read_pickle():

    Parameters:
        - path: The file path to read.

16. read_fwf():

    Parameters:
        - filepath_or_buffer: The file path or buffer to read.
        - colspecs: A list of tuples specifying column positions.
        Many other options for fixed-width format parsing.

17. read_sql_table():

    Parameters:
        - table_name: The SQL table name to read.
        - con: SQLAlchemy engine or database connection.
        - index_col: Column(s) to use as the row labels of the DataFrame.
        Various other SQL-related options.
"""

# Reading Files

In [3]:
ls

 Volume in drive C has no label.
 Volume Serial Number is 6A09-6524

 Directory of C:\Users\win\Ineuron Classes

27-10-2023  16:00    <DIR>          .
20-10-2023  23:27    <DIR>          ..
17-10-2023  14:41    <DIR>          .ipynb_checkpoints
20-09-2023  18:18    <DIR>          __pycache__
27-05-2022  11:53             1,173 addresses.csv
22-05-2023  22:22            51,867 car.data
06-12-2022  21:39             5,497 Class 01.ipynb
09-12-2022  22:47            24,947 Class 02.ipynb
20-09-2023  15:38            35,934 Class 03 - Tuple, Set and Dict.ipynb
24-12-2022  00:25            18,225 Class 04 - If, Else & For loop.ipynb
12-03-2023  22:30            17,480 Class 05 - For-else and while loop.ipynb
11-03-2023  22:51            19,426 Class 06 - Loops in details.ipynb
12-03-2023  23:55            25,172 Class 07 - Functions 1.ipynb
20-09-2023  17:44            49,761 Class 08 - Functions 2.ipynb
17-09-2023  02:53            43,389 Class 09 - Iterator Generator and File System.ipynb

In [6]:
"""
Signature:
pd.read_csv(
    filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]',
    *,
    sep: 'str | None | lib.NoDefault' = <no_default>,
    delimiter: 'str | None | lib.NoDefault' = None,
    header: "int | Sequence[int] | None | Literal['infer']" = 'infer',
    names: 'Sequence[Hashable] | None | lib.NoDefault' = <no_default>,
    index_col: 'IndexLabel | Literal[False] | None' = None,
    usecols=None,
    squeeze: 'bool | None' = None,
    prefix: 'str | lib.NoDefault' = <no_default>,
    mangle_dupe_cols: 'bool' = True,
    dtype: 'DtypeArg | None' = None,
    engine: 'CSVEngine | None' = None,
    converters=None,
    true_values=None,
    false_values=None,
    skipinitialspace: 'bool' = False,
    skiprows=None,
    skipfooter: 'int' = 0,
    nrows: 'int | None' = None,
    na_values=None,
    keep_default_na: 'bool' = True,
    na_filter: 'bool' = True,
    verbose: 'bool' = False,
    skip_blank_lines: 'bool' = True,
    parse_dates=None,
    infer_datetime_format: 'bool' = False,
    keep_date_col: 'bool' = False,
    date_parser=None,
    dayfirst: 'bool' = False,
    cache_dates: 'bool' = True,
    iterator: 'bool' = False,
    chunksize: 'int | None' = None,
    compression: 'CompressionOptions' = 'infer',
    thousands: 'str | None' = None,
    decimal: 'str' = '.',
    lineterminator: 'str | None' = None,
    quotechar: 'str' = '"',
    quoting: 'int' = 0,
    doublequote: 'bool' = True,
    escapechar: 'str | None' = None,
    comment: 'str | None' = None,
    encoding: 'str | None' = None,
    encoding_errors: 'str | None' = 'strict',
    dialect: 'str | csv.Dialect | None' = None,
    error_bad_lines: 'bool | None' = None,
    warn_bad_lines: 'bool | None' = None,
    on_bad_lines=None,
    delim_whitespace: 'bool' = False,
    low_memory=True,
    memory_map: 'bool' = False,
    float_precision: "Literal['high', 'legacy'] | None" = None,
    storage_options: 'StorageOptions' = None,
) -> 'DataFrame | TextFileReader'
Docstring:
Read a comma-separated values (csv) file into DataFrame.

Also supports optionally iterating or breaking of the file
into chunks.

Additional help can be found in the online docs for
`IO Tools <https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html>`_.

Parameters
----------
filepath_or_buffer : str, path object or file-like object
    Any valid string path is acceptable. The string could be a URL. Valid
    URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is
    expected. A local file could be: file://localhost/path/to/table.csv.

    If you want to pass in a path object, pandas accepts any ``os.PathLike``.

    By file-like object, we refer to objects with a ``read()`` method, such as
    a file handle (e.g. via builtin ``open`` function) or ``StringIO``.
sep : str, default ','
    Delimiter to use. If sep is None, the C engine cannot automatically detect
    the separator, but the Python parsing engine can, meaning the latter will
    be used and automatically detect the separator by Python's builtin sniffer
    tool, ``csv.Sniffer``. In addition, separators longer than 1 character and
    different from ``'\s+'`` will be interpreted as regular expressions and
    will also force the use of the Python parsing engine. Note that regex
    delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``.
delimiter : str, default ``None``
    Alias for sep.
header : int, list of int, None, default 'infer'
    Row number(s) to use as the column names, and the start of the
    data.  Default behavior is to infer the column names: if no names
    are passed the behavior is identical to ``header=0`` and column
    names are inferred from the first line of the file, if column
    names are passed explicitly then the behavior is identical to
    ``header=None``. Explicitly pass ``header=0`` to be able to
    replace existing names. The header can be a list of integers that
    specify row locations for a multi-index on the columns
    e.g. [0,1,3]. Intervening rows that are not specified will be
    skipped (e.g. 2 in this example is skipped). Note that this
    parameter ignores commented lines and empty lines if
    ``skip_blank_lines=True``, so ``header=0`` denotes the first line of
    data rather than the first line of the file.
names : array-like, optional
    List of column names to use. If the file contains a header row,
    then you should explicitly pass ``header=0`` to override the column names.
    Duplicates in this list are not allowed.
index_col : int, str, sequence of int / str, or False, optional, default ``None``
  Column(s) to use as the row labels of the ``DataFrame``, either given as
  string name or column index. If a sequence of int / str is given, a
  MultiIndex is used.

  Note: ``index_col=False`` can be used to force pandas to *not* use the first
  column as the index, e.g. when you have a malformed file with delimiters at
  the end of each line.
usecols : list-like or callable, optional
    Return a subset of the columns. If list-like, all elements must either
    be positional (i.e. integer indices into the document columns) or strings
    that correspond to column names provided either by the user in `names` or
    inferred from the document header row(s). If ``names`` are given, the document
    header row(s) are not taken into account. For example, a valid list-like
    `usecols` parameter would be ``[0, 1, 2]`` or ``['foo', 'bar', 'baz']``.
    Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``.
    To instantiate a DataFrame from ``data`` with element order preserved use
    ``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns
    in ``['foo', 'bar']`` order or
    ``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]``
    for ``['bar', 'foo']`` order.

    If callable, the callable function will be evaluated against the column
    names, returning names where the callable function evaluates to True. An
    example of a valid callable argument would be ``lambda x: x.upper() in
    ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
    parsing time and lower memory usage.
squeeze : bool, default False
    If the parsed data only contains one column then return a Series.

    .. deprecated:: 1.4.0
        Append ``.squeeze("columns")`` to the call to ``read_csv`` to squeeze
        the data.
prefix : str, optional
    Prefix to add to column numbers when no header, e.g. 'X' for X0, X1, ...

    .. deprecated:: 1.4.0
       Use a list comprehension on the DataFrame's columns after calling ``read_csv``.
mangle_dupe_cols : bool, default True
    Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than
    'X'...'X'. Passing in False will cause data to be overwritten if there
    are duplicate names in the columns.

    .. deprecated:: 1.5.0
        Not implemented, and a new argument to specify the pattern for the
        names of duplicated columns will be added instead
dtype : Type name or dict of column -> type, optional
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32,
    'c': 'Int64'}
    Use `str` or `object` together with suitable `na_values` settings
    to preserve and not interpret dtype.
    If converters are specified, they will be applied INSTEAD
    of dtype conversion.

    .. versionadded:: 1.5.0

        Support for defaultdict was added. Specify a defaultdict as input where
        the default determines the dtype of the columns which are not explicitly
        listed.
engine : {'c', 'python', 'pyarrow'}, optional
    Parser engine to use. The C and pyarrow engines are faster, while the python engine
    is currently more feature-complete. Multithreading is currently only supported by
    the pyarrow engine.

    .. versionadded:: 1.4.0

        The "pyarrow" engine was added as an *experimental* engine, and some features
        are unsupported, or may not work correctly, with this engine.
converters : dict, optional
    Dict of functions for converting values in certain columns. Keys can either
    be integers or column labels.
true_values : list, optional
    Values to consider as True.
false_values : list, optional
    Values to consider as False.
skipinitialspace : bool, default False
    Skip spaces after delimiter.
skiprows : list-like, int or callable, optional
    Line numbers to skip (0-indexed) or number of lines to skip (int)
    at the start of the file.

    If callable, the callable function will be evaluated against the row
    indices, returning True if the row should be skipped and False otherwise.
    An example of a valid callable argument would be ``lambda x: x in [0, 2]``.
skipfooter : int, default 0
    Number of lines at bottom of file to skip (Unsupported with engine='c').
nrows : int, optional
    Number of rows of file to read. Useful for reading pieces of large files.
na_values : scalar, str, list-like, or dict, optional
    Additional strings to recognize as NA/NaN. If dict passed, specific
    per-column NA values.  By default the following values are interpreted as
    NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',
    '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a',
    'nan', 'null'.
keep_default_na : bool, default True
    Whether or not to include the default NaN values when parsing the data.
    Depending on whether `na_values` is passed in, the behavior is as follows:

    * If `keep_default_na` is True, and `na_values` are specified, `na_values`
      is appended to the default NaN values used for parsing.
    * If `keep_default_na` is True, and `na_values` are not specified, only
      the default NaN values are used for parsing.
    * If `keep_default_na` is False, and `na_values` are specified, only
      the NaN values specified `na_values` are used for parsing.
    * If `keep_default_na` is False, and `na_values` are not specified, no
      strings will be parsed as NaN.

    Note that if `na_filter` is passed in as False, the `keep_default_na` and
    `na_values` parameters will be ignored.
na_filter : bool, default True
    Detect missing value markers (empty strings and the value of na_values). In
    data without any NAs, passing na_filter=False can improve the performance
    of reading a large file.
verbose : bool, default False
    Indicate number of NA values placed in non-numeric columns.
skip_blank_lines : bool, default True
    If True, skip over blank lines rather than interpreting as NaN values.
parse_dates : bool or list of int or names or list of lists or dict, default False
    The behavior is as follows:

    * boolean. If True -> try parsing the index.
    * list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3
      each as a separate date column.
    * list of lists. e.g.  If [[1, 3]] -> combine columns 1 and 3 and parse as
      a single date column.
    * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call
      result 'foo'

    If a column or index cannot be represented as an array of datetimes,
    say because of an unparsable value or a mixture of timezones, the column
    or index will be returned unaltered as an object data type. For
    non-standard datetime parsing, use ``pd.to_datetime`` after
    ``pd.read_csv``. To parse an index or column with a mixture of timezones,
    specify ``date_parser`` to be a partially-applied
    :func:`pandas.to_datetime` with ``utc=True``. See
    :ref:`io.csv.mixed_timezones` for more.

    Note: A fast-path exists for iso8601-formatted dates.
infer_datetime_format : bool, default False
    If True and `parse_dates` is enabled, pandas will attempt to infer the
    format of the datetime strings in the columns, and if it can be inferred,
    switch to a faster method of parsing them. In some cases this can increase
    the parsing speed by 5-10x.
keep_date_col : bool, default False
    If True and `parse_dates` specifies combining multiple columns then
    keep the original columns.
date_parser : function, optional
    Function to use for converting a sequence of string columns to an array of
    datetime instances. The default uses ``dateutil.parser.parser`` to do the
    conversion. Pandas will try to call `date_parser` in three different ways,
    advancing to the next if an exception occurs: 1) Pass one or more arrays
    (as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the
    string values from the columns defined by `parse_dates` into a single array
    and pass that; and 3) call `date_parser` once for each row using one or
    more strings (corresponding to the columns defined by `parse_dates`) as
    arguments.
dayfirst : bool, default False
    DD/MM format dates, international and European format.
cache_dates : bool, default True
    If True, use a cache of unique, converted dates to apply the datetime
    conversion. May produce significant speed-up when parsing duplicate
    date strings, especially ones with timezone offsets.

    .. versionadded:: 0.25.0
iterator : bool, default False
    Return TextFileReader object for iteration or getting chunks with
    ``get_chunk()``.

    .. versionchanged:: 1.2

       ``TextFileReader`` is a context manager.
chunksize : int, optional
    Return TextFileReader object for iteration.
    See the `IO Tools docs
    <https://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking>`_
    for more information on ``iterator`` and ``chunksize``.

    .. versionchanged:: 1.2

       ``TextFileReader`` is a context manager.
compression : str or dict, default 'infer'
    For on-the-fly decompression of on-disk data. If 'infer' and 'filepath_or_buffer' is
    path-like, then detect compression from the following extensions: '.gz',
    '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'
    (otherwise no compression).
    If using 'zip' or 'tar', the ZIP file must contain only one data file to be read in.
    Set to ``None`` for no decompression.
    Can also be a dict with key ``'method'`` set
    to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'tar'``} and other
    key-value pairs are forwarded to
    ``zipfile.ZipFile``, ``gzip.GzipFile``,
    ``bz2.BZ2File``, ``zstandard.ZstdDecompressor`` or
    ``tarfile.TarFile``, respectively.
    As an example, the following could be passed for Zstandard decompression using a
    custom compression dictionary:
    ``compression={'method': 'zstd', 'dict_data': my_compression_dict}``.

        .. versionadded:: 1.5.0
            Added support for `.tar` files.

    .. versionchanged:: 1.4.0 Zstandard support.

thousands : str, optional
    Thousands separator.
decimal : str, default '.'
    Character to recognize as decimal point (e.g. use ',' for European data).
lineterminator : str (length 1), optional
    Character to break file into lines. Only valid with C parser.
quotechar : str (length 1), optional
    The character used to denote the start and end of a quoted item. Quoted
    items can include the delimiter and it will be ignored.
quoting : int or csv.QUOTE_* instance, default 0
    Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of
    QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).
doublequote : bool, default ``True``
   When quotechar is specified and quoting is not ``QUOTE_NONE``, indicate
   whether or not to interpret two consecutive quotechar elements INSIDE a
   field as a single ``quotechar`` element.
escapechar : str (length 1), optional
    One-character string used to escape other characters.
comment : str, optional
    Indicates remainder of line should not be parsed. If found at the beginning
    of a line, the line will be ignored altogether. This parameter must be a
    single character. Like empty lines (as long as ``skip_blank_lines=True``),
    fully commented lines are ignored by the parameter `header` but not by
    `skiprows`. For example, if ``comment='#'``, parsing
    ``#empty\na,b,c\n1,2,3`` with ``header=0`` will result in 'a,b,c' being
    treated as the header.
encoding : str, optional
    Encoding to use for UTF when reading/writing (ex. 'utf-8'). `List of Python
    standard encodings
    <https://docs.python.org/3/library/codecs.html#standard-encodings>`_ .

    .. versionchanged:: 1.2

       When ``encoding`` is ``None``, ``errors="replace"`` is passed to
       ``open()``. Otherwise, ``errors="strict"`` is passed to ``open()``.
       This behavior was previously only the case for ``engine="python"``.

    .. versionchanged:: 1.3.0

       ``encoding_errors`` is a new argument. ``encoding`` has no longer an
       influence on how encoding errors are handled.

encoding_errors : str, optional, default "strict"
    How encoding errors are treated. `List of possible values
    <https://docs.python.org/3/library/codecs.html#error-handlers>`_ .

    .. versionadded:: 1.3.0

dialect : str or csv.Dialect, optional
    If provided, this parameter will override values (default or not) for the
    following parameters: `delimiter`, `doublequote`, `escapechar`,
    `skipinitialspace`, `quotechar`, and `quoting`. If it is necessary to
    override values, a ParserWarning will be issued. See csv.Dialect
    documentation for more details.
error_bad_lines : bool, optional, default ``None``
    Lines with too many fields (e.g. a csv line with too many commas) will by
    default cause an exception to be raised, and no DataFrame will be returned.
    If False, then these "bad lines" will be dropped from the DataFrame that is
    returned.

    .. deprecated:: 1.3.0
       The ``on_bad_lines`` parameter should be used instead to specify behavior upon
       encountering a bad line instead.
warn_bad_lines : bool, optional, default ``None``
    If error_bad_lines is False, and warn_bad_lines is True, a warning for each
    "bad line" will be output.

    .. deprecated:: 1.3.0
       The ``on_bad_lines`` parameter should be used instead to specify behavior upon
       encountering a bad line instead.
on_bad_lines : {'error', 'warn', 'skip'} or callable, default 'error'
    Specifies what to do upon encountering a bad line (a line with too many fields).
    Allowed values are :

        - 'error', raise an Exception when a bad line is encountered.
        - 'warn', raise a warning when a bad line is encountered and skip that line.
        - 'skip', skip bad lines without raising or warning when they are encountered.

    .. versionadded:: 1.3.0

    .. versionadded:: 1.4.0

        - callable, function with signature
          ``(bad_line: list[str]) -> list[str] | None`` that will process a single
          bad line. ``bad_line`` is a list of strings split by the ``sep``.
          If the function returns ``None``, the bad line will be ignored.
          If the function returns a new list of strings with more elements than
          expected, a ``ParserWarning`` will be emitted while dropping extra elements.
          Only supported when ``engine="python"``

delim_whitespace : bool, default False
    Specifies whether or not whitespace (e.g. ``' '`` or ``'    '``) will be
    used as the sep. Equivalent to setting ``sep='\s+'``. If this option
    is set to True, nothing should be passed in for the ``delimiter``
    parameter.
low_memory : bool, default True
    Internally process the file in chunks, resulting in lower memory use
    while parsing, but possibly mixed type inference.  To ensure no mixed
    types either set False, or specify the type with the `dtype` parameter.
    Note that the entire file is read into a single DataFrame regardless,
    use the `chunksize` or `iterator` parameter to return the data in chunks.
    (Only valid with C parser).
memory_map : bool, default False
    If a filepath is provided for `filepath_or_buffer`, map the file object
    directly onto memory and access the data directly from there. Using this
    option can improve performance because there is no longer any I/O overhead.
float_precision : str, optional
    Specifies which converter the C engine should use for floating-point
    values. The options are ``None`` or 'high' for the ordinary converter,
    'legacy' for the original lower precision pandas converter, and
    'round_trip' for the round-trip converter.

    .. versionchanged:: 1.2

storage_options : dict, optional
    Extra options that make sense for a particular storage connection, e.g.
    host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
    are forwarded to ``urllib.request.Request`` as header options. For other
    URLs (e.g. starting with "s3://", and "gcs://") the key-value pairs are
    forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more
    details, and for more examples on storage options refer `here
    <https://pandas.pydata.org/docs/user_guide/io.html?
    highlight=storage_options#reading-writing-remote-files>`_.

    .. versionadded:: 1.2

Returns
-------
DataFrame or TextParser
    A comma-separated values (csv) file is returned as two-dimensional
    data structure with labeled axes.

See Also
--------
DataFrame.to_csv : Write DataFrame to a comma-separated values (csv) file.
read_csv : Read a comma-separated values (csv) file into DataFrame.
read_fwf : Read a table of fixed-width formatted lines into DataFrame.
"""

In [4]:
# To read an CSV file

pd.read_csv('Customer_Churn.csv')

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


In [5]:
# Reading files form different location
pd.read_csv("C:\Users\win\Downloads\Compressed\ohana-api-master\ohana-api-master\data\sample-csv\services.csv")

# This Gives Error because we have used backwordslash which is a unicode in python
# We have to run this a rwa string like shown below

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (315189030.py, line 2)

In [6]:
# Runing the location with raw string
pd.read_csv(r'C:\Users\win\Downloads\Compressed\ohana-api-master\ohana-api-master\data\sample-csv\services.csv')

Unnamed: 0,id,location_id,program_id,accepted_payments,alternate_name,application_process,audience,description,eligibility,email,...,interpretation_services,keywords,languages,name,required_documents,service_areas,status,wait_time,website,taxonomy_ids
0,1,1,,,,Walk in or apply by phone.,"Older adults age 55 or over, ethnic minorities...",A walk-in center for older adults that provide...,"Age 55 or over for most programs, age 60 or ov...",,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Fair Oaks Adult Activity Center,,Colma,active,No wait.,,
1,2,2,,,,Apply by phone for an appointment.,Residents of San Mateo County age 55 or over,Provides training and job placement to eligibl...,"Age 55 or over, county resident and willing an...",,...,,"EMPLOYMENT/TRAINING SERVICES, Job Development,...",,Second Career Employment Program,,San Mateo County,active,Varies.,,
2,3,3,,,,Phone for information (403-4300 Ext. 4322).,Older adults age 55 or over who can benefit fr...,Offers supportive counseling services to San M...,Resident of San Mateo County age 55 or over,,...,,"Geriatric Counseling, Older Adults, Gay, Lesbi...",,Senior Peer Counseling,,San Mateo County,active,Varies.,,
3,4,4,,,,Apply by phone.,"Parents, children, families with problems of c...",Provides supervised visitation services and a ...,,,...,,"INDIVIDUAL AND FAMILY DEVELOPMENT SERVICES, Gr...",,Family Visitation Center,,San Mateo County,active,No wait.,,
4,5,5,,,,Phone for information.,Low-income working families with children tran...,Provides fixed 8% short term loans to eligible...,Eligibility: Low-income family with legal cust...,,...,,"COMMUNITY SERVICES, Speakers, Automobile Loans",,Economic Self-Sufficiency Program,,San Mateo County,active,,,
5,6,6,,,,Walk in or apply by phone for membership appli...,Any age,A multipurpose center offering a wide variety ...,,,...,,"ADULT PROTECTION AND CARE SERVICES, In-Home Su...",,Little House Recreational Activities,,San Mateo County,active,No wait.,,
6,7,7,,,,"Apply by phone or be referred by a doctor, soc...","Older adults who have memory or sensory loss, ...",Rosener House is a day center for older adults...,Age 18 or over,,...,,"ADULT PROTECTION AND CARE SERVICES, Adult Day ...",,Rosener House Adult Day Services,,"Belmont, Burlingame, East Palo Alto",active,No wait.,,
7,8,8,,,,Apply by phone.,"Senior citizens age 60 or over, disabled indiv...",Delivers a hot meal to the home of persons age...,Homebound person unable to cook or shop,,...,,"ADULT PROTECTION AND CARE SERVICES, Meal Sites...",,Meals on Wheels - South County,,"Belmont, East Palo Alto",active,No wait.,,
8,9,9,,,,Walk in. Proof of residency in California requ...,"Ethnic minorities, especially Spanish speaking","Provides general reading material, including b...",Resident of California to obtain a library card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Fair Oaks Branch,,San Mateo County,active,No wait.,,
9,10,10,,,,Walk in. Proof of California residency to rece...,,"Provides general reading and media materials, ...",Resident of California to obtain a card,,...,,"EDUCATION SERVICES, Library, Libraries, Public...",,Main Library,,San Mateo County,active,No wait.,,


In [7]:
# pandas can read only the Structured data
# Non structured data pandas will not work

In [8]:
# This Complete table is called as DATA-FRAME

df = pd.read_csv('Customer_Churn.csv')
type(df)

pandas.core.frame.DataFrame

In [9]:
# In Pandas the very 1st row is considered as the header/collumn name
# To avoid that we can use the parameters inside the read_csv command
pd.read_csv('Customer_Churn.csv', header = None)

# This 'header = None' is used to remove all the headers

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,11,12,13,14,15,16,17,18,19,20
0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
1,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
2,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
3,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
4,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7039,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.8,1990.5,No
7040,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.2,7362.9,No
7041,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.6,346.45,No
7042,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.4,306.6,Yes


In [10]:
# To add the Header names we can use 'names' parameter
pd.read_csv(r'taxonomy.csv',names = ["a","b","c"])

# Here we can see that there are 4 cloumns but we have only entered 3 names. 
# So pandas will take the names from the right to left

# And observe that the automatic row count is gone now

Unnamed: 0,a,b,c
taxonomy_id,name,parent_id,parent_name
101,Emergency,,
101-01,Disaster Response,101,Emergency
101-02,Emergency Cash,101,Emergency
101-02-01,Help Pay for Food,101-02,Emergency Cash
...,...,...,...
111-01-07,Workplace Rights,111-01,Advocacy & Legal Aid
111-02,Mediation,111,Legal
111-03,Notary,111,Legal
111-04,Representation,111,Legal


In [11]:
# And in case of extra columns. 
# It will add a cloumn with "NaN" inside as value
pd.read_csv(r'taxonomy.csv',names = ["a","b","c","d","e"])

Unnamed: 0,a,b,c,d,e
0,taxonomy_id,name,parent_id,parent_name,
1,101,Emergency,,,
2,101-01,Disaster Response,101,Emergency,
3,101-02,Emergency Cash,101,Emergency,
4,101-02-01,Help Pay for Food,101-02,Emergency Cash,
...,...,...,...,...,...
286,111-01-07,Workplace Rights,111-01,Advocacy & Legal Aid,
287,111-02,Mediation,111,Legal,
288,111-03,Notary,111,Legal,
289,111-04,Representation,111,Legal,


In [12]:
# In case the data is not seperated by "," in the csv file
pd.read_csv('sample_test_data.csv')

Unnamed: 0,EMPLOYEE_ID@FIRST_NAME@LAST_NAME@EMAIL@PHONE_NUMBER@HIRE_DATE@JOB_ID@SALARY@COMMISSION_PCT@MANAGER_ID@DEPARTMENT_ID
0,198@Donald@OConnell@DOCONNEL@650.507.9833@21-J...
1,199@Douglas@Grant@DGRANT@650.507.9844@13-Jan-0...
2,200@Jennifer@Whalen@JWHALEN@515.123.4444@17-Se...
3,201@Michael@Hartstein@MHARTSTE@515.123.5555@17...
4,202@Pat@Fay@PFAY@603.123.6666@17-Aug-05@MK_REP...
5,203@Susan@Mavris@SMAVRIS@515.123.7777@07-Jun-0...
6,204@Hermann@Baer@HBAER@515.123.8888@07-Jun-02@...
7,205@Shelley@Higgins@SHIGGINS@515.123.8080@07-J...
8,206@William@Gietz@WGIETZ@515.123.8181@07-Jun-0...
9,100@Steven@King@SKING@515.123.4567@17-Jun-03@A...


In [91]:
# To read csv files where the saperator is not ","
# We can use 'sep' parameter

df1 = pd.read_csv('sample_test_data.csv', sep = "@")
df1

Unnamed: 0,EMPLOYEE_ID,FIRST_NAME,LAST_NAME,EMAIL,PHONE_NUMBER,HIRE_DATE,JOB_ID,SALARY,COMMISSION_PCT,MANAGER_ID,DEPARTMENT_ID
0,198,Donald,OConnell,DOCONNEL,650.507.9833,21-Jun-07,SH_CLERK,2600,-,124,50
1,199,Douglas,Grant,DGRANT,650.507.9844,13-Jan-08,SH_CLERK,2600,-,124,50
2,200,Jennifer,Whalen,JWHALEN,515.123.4444,17-Sep-03,AD_ASST,4400,-,101,10
3,201,Michael,Hartstein,MHARTSTE,515.123.5555,17-Feb-04,MK_MAN,13000,-,100,20
4,202,Pat,Fay,PFAY,603.123.6666,17-Aug-05,MK_REP,6000,-,201,20
5,203,Susan,Mavris,SMAVRIS,515.123.7777,07-Jun-02,HR_REP,6500,-,101,40
6,204,Hermann,Baer,HBAER,515.123.8888,07-Jun-02,PR_REP,10000,-,101,70
7,205,Shelley,Higgins,SHIGGINS,515.123.8080,07-Jun-02,AC_MGR,12008,-,101,110
8,206,William,Gietz,WGIETZ,515.123.8181,07-Jun-02,AC_ACCOUNT,8300,-,205,110
9,100,Steven,King,SKING,515.123.4567,17-Jun-03,AD_PRES,24000,-,-,90


In [14]:
# In case you want to skip any row/index
pd.read_csv('sample_test_data.csv', sep = "@", skiprows= [1,3])

# It will skip the 1st and the 3rd row value
# Again defined by index

Unnamed: 0,EMPLOYEE_ID,FIRST_NAME,LAST_NAME,EMAIL,PHONE_NUMBER,HIRE_DATE,JOB_ID,SALARY,COMMISSION_PCT,MANAGER_ID,DEPARTMENT_ID
0,199,Douglas,Grant,DGRANT,650.507.9844,13-Jan-08,SH_CLERK,2600,-,124,50
1,201,Michael,Hartstein,MHARTSTE,515.123.5555,17-Feb-04,MK_MAN,13000,-,100,20
2,202,Pat,Fay,PFAY,603.123.6666,17-Aug-05,MK_REP,6000,-,201,20
3,203,Susan,Mavris,SMAVRIS,515.123.7777,07-Jun-02,HR_REP,6500,-,101,40
4,204,Hermann,Baer,HBAER,515.123.8888,07-Jun-02,PR_REP,10000,-,101,70
5,205,Shelley,Higgins,SHIGGINS,515.123.8080,07-Jun-02,AC_MGR,12008,-,101,110
6,206,William,Gietz,WGIETZ,515.123.8181,07-Jun-02,AC_ACCOUNT,8300,-,205,110
7,100,Steven,King,SKING,515.123.4567,17-Jun-03,AD_PRES,24000,-,-,90
8,101,Neena,Kochhar,NKOCHHAR,515.123.4568,21-Sep-05,AD_VP,17000,-,100,90
9,102,Lex,De Haan,LDEHAAN,515.123.4569,13-Jan-01,AD_VP,17000,-,100,90


In [15]:
df = pd.read_csv('Customer_Churn.csv')
df

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


In [16]:
# type of any table is dataframe
type(df)

pandas.core.frame.DataFrame

In [17]:
# Now to fins the data type of eatch column. 
# We can use "dtypes" command

df.dtypes

customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

In [18]:
# TO get only the first 5 records
# we can use "head()" command
df.head()

# The "head()" command will show only the top 5 values

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [19]:
# Simallarly for the first n records
# use head(n)
df.head(10)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes
5,9305-CDSKC,Female,0,No,No,8,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,Yes
6,1452-KIOVK,Male,0,No,Yes,22,Yes,Yes,Fiber optic,No,...,No,No,Yes,No,Month-to-month,Yes,Credit card (automatic),89.1,1949.4,No
7,6713-OKOMC,Female,0,No,No,10,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,No,Mailed check,29.75,301.9,No
8,7892-POOKP,Female,0,Yes,No,28,Yes,Yes,Fiber optic,No,...,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes
9,6388-TABGU,Male,0,No,Yes,62,Yes,No,DSL,Yes,...,No,No,No,No,One year,No,Bank transfer (automatic),56.15,3487.95,No


In [20]:
# Simmarly for fetching data from bottom
# Use tail()
df.tail(7)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
7036,7750-EYXWZ,Female,0,No,No,12,No,No phone service,DSL,No,...,Yes,Yes,Yes,Yes,One year,No,Electronic check,60.65,743.3,No
7037,2569-WGERO,Female,0,No,No,72,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,Yes,Bank transfer (automatic),21.15,1419.4,No
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.8,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.2,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.6,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.4,306.6,Yes
7042,3186-AJIEK,Male,0,No,No,66,Yes,No,Fiber optic,Yes,...,Yes,Yes,Yes,Yes,Two year,Yes,Bank transfer (automatic),105.65,6844.5,No


In [21]:
# for fetching the column names
df.columns

# returns a list with all the columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [22]:
# To get data from only selected cloumns
# For that you can treet the dataframe as a list to fetch values

a = df["customerID"]
a

0       7590-VHVEG
1       5575-GNVDE
2       3668-QPYBK
3       7795-CFOCW
4       9237-HQITU
           ...    
7038    6840-RESVB
7039    2234-XADUH
7040    4801-JZAZL
7041    8361-LTMKD
7042    3186-AJIEK
Name: customerID, Length: 7043, dtype: object

In [23]:
# The datatype for the above data is a 'series'
type(a)

# Here Series is refered to a single row or a single column data
# And a series is nothing but a list

pandas.core.series.Series

In [24]:
# that is why we can't bring more then 1 column data at a time 
df["customerID",'PaymentMethod']

# It gives a key error. 

KeyError: ('customerID', 'PaymentMethod')

In [26]:
# To fix the above issue
# We have to pass a list of columns to the dataframe

b = df[["customerID",'PaymentMethod']]
b

Unnamed: 0,customerID,PaymentMethod
0,7590-VHVEG,Electronic check
1,5575-GNVDE,Mailed check
2,3668-QPYBK,Mailed check
3,7795-CFOCW,Bank transfer (automatic)
4,9237-HQITU,Electronic check
...,...,...
7038,6840-RESVB,Mailed check
7039,2234-XADUH,Credit card (automatic)
7040,4801-JZAZL,Electronic check
7041,8361-LTMKD,Mailed check


In [27]:
# Now as we know only 1 row and only 1 column
# so more the 1 column is called a dataframe
type(b)

pandas.core.frame.DataFrame

In [33]:
# But in case of the below example
c = df[["customerID"]]
type(c)

# This is considered as dataframe only 
# Because we are passing a list of columns

In [28]:
ls

 Volume in drive C has no label.
 Volume Serial Number is 6A09-6524

 Directory of C:\Users\win\Ineuron Classes

27-10-2023  16:00    <DIR>          .
20-10-2023  23:27    <DIR>          ..
17-10-2023  14:41    <DIR>          .ipynb_checkpoints
20-09-2023  18:18    <DIR>          __pycache__
27-05-2022  11:53             1,173 addresses.csv
22-05-2023  22:22            51,867 car.data
06-12-2022  21:39             5,497 Class 01.ipynb
09-12-2022  22:47            24,947 Class 02.ipynb
20-09-2023  15:38            35,934 Class 03 - Tuple, Set and Dict.ipynb
24-12-2022  00:25            18,225 Class 04 - If, Else & For loop.ipynb
12-03-2023  22:30            17,480 Class 05 - For-else and while loop.ipynb
11-03-2023  22:51            19,426 Class 06 - Loops in details.ipynb
12-03-2023  23:55            25,172 Class 07 - Functions 1.ipynb
20-09-2023  17:44            49,761 Class 08 - Functions 2.ipynb
17-09-2023  02:53            43,389 Class 09 - Iterator Generator and File System.ipynb

In [29]:
# Reading file from an excel workbook

pd.read_excel("LUSID Excel - Manage Orders.xlsx")

# As we can observe that we can only see the data from the first sheet only
# All the other sheets are not visible

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18
0,,This sheet allows you to manage executions:,,,,,,,,,,,,,,,,,
1,,1. List executions,,,,,,,,,,,,,,,,,
2,,2. Get execution,,,,,,,,,,,,,,,,,
3,,3. Upsert executions,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,,,,,,,,,,,,,,,,,,,
66,,,,,,,,,,,,,,,,,,,
67,,,,,,,,,,,,,,,,,,,
68,,,Code,,,,,,,,,,,,,,,,


In [30]:
# To read other sheets from the work book
# We have to use 'sheet_name' parameter
pd.read_excel('LUSID Excel - Manage Orders.xlsx', sheet_name= "Blocks")

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16
0,,This sheet allows you to manage Blocks:,,,,,,,,,,,,,,,
1,,1. ListBlocks,,,,,,,,,,,,,,,
2,,2. GetBlock,,,,,,,,,,,,,,,
3,,3. UpsertBlocks,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,
5,,,,,,,,,,,,,,,,,
6,,,,,,,,,,,,,,,,,
7,,,,,,,,,,,,,,,,,
8,,1. Get a list of blocks,,,,,,,,,,,,,,,
9,,,,,,,,,,,,,,,,,


In [31]:
# To show all the sheet names form the excel workbook

# First we hhave to assign the Excel file to a variable
# By using ExcelFile() function
excel_file = pd.ExcelFile("LUSID Excel - Manage Orders.xlsx")

# Then run the variable with 'sheet_names' command
excel_file.sheet_names

# The output from that will be a list of all the sheet names

['Executions', 'Orders', 'Allocations', 'Placements', 'Blocks']

In [32]:
# Using this we can load all the sheet data in python

list1 = []
for i in excel_file.sheet_names:
    list1.append(pd.read_excel('LUSID Excel - Manage Orders.xlsx', sheet_name= i))

In [33]:
list1[4]

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16
0,,This sheet allows you to manage Blocks:,,,,,,,,,,,,,,,
1,,1. ListBlocks,,,,,,,,,,,,,,,
2,,2. GetBlock,,,,,,,,,,,,,,,
3,,3. UpsertBlocks,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,
5,,,,,,,,,,,,,,,,,
6,,,,,,,,,,,,,,,,,
7,,,,,,,,,,,,,,,,,
8,,1. Get a list of blocks,,,,,,,,,,,,,,,
9,,,,,,,,,,,,,,,,,


In [34]:
# What if we past the URL directly in the path

pd.read_csv('https://github.com/datasciencedojo/datasets/blob/master/titanic.csv')

# This does not work To fix this 
# In Github you have an option called "RAW"

ParserError: Error tokenizing data. C error: Expected 1 fields in line 33, saw 6


In [35]:
# Using "RAW" option we can bring that data here without downloading

pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [36]:
# Importing using the read_html data
df2 = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2015_totals.html')
df2

[      Rk          Player Pos Age   Tm   G  GS    MP   FG  FGA  ...   FT%  ORB  \
 0      1      Quincy Acy  PF  24  NYK  68  22  1287  152  331  ...  .784   79   
 1      2    Jordan Adams  SG  20  MEM  30   0   248   35   86  ...  .609    9   
 2      3    Steven Adams   C  21  OKC  70  67  1771  217  399  ...  .502  199   
 3      4     Jeff Adrien  PF  28  MIN  17   0   215   19   44  ...  .579   23   
 4      5   Arron Afflalo  SG  29  TOT  78  72  2502  375  884  ...  .843   27   
 ..   ...             ...  ..  ..  ...  ..  ..   ...  ...  ...  ...   ...  ...   
 670  490  Thaddeus Young  PF  26  TOT  76  68  2434  451  968  ...  .655  127   
 671  490  Thaddeus Young  PF  26  MIN  48  48  1605  289  641  ...  .682   75   
 672  490  Thaddeus Young  PF  26  BRK  28  20   829  162  327  ...  .606   52   
 673  491     Cody Zeller   C  22  CHO  62  45  1487  172  373  ...  .774   97   
 674  492    Tyler Zeller   C  25  BOS  82  59  1731  340  619  ...  .823  146   
 
      DRB  TRB

In [37]:
# But once we upload the data it comes in a form of an list
# Not as a dataframe

type(df2)

list

In [38]:
# Now trying to understand the number of elements in the list
len(df2)

1

In [39]:
# Calling that element
# Gives us the dataframe that we want from the URL

d = df2[0]
d

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Quincy Acy,PF,24,NYK,68,22,1287,152,331,...,.784,79,222,301,68,27,22,60,147,398
1,2,Jordan Adams,SG,20,MEM,30,0,248,35,86,...,.609,9,19,28,16,16,7,14,24,94
2,3,Steven Adams,C,21,OKC,70,67,1771,217,399,...,.502,199,324,523,66,38,86,99,222,537
3,4,Jeff Adrien,PF,28,MIN,17,0,215,19,44,...,.579,23,54,77,15,4,9,9,30,60
4,5,Arron Afflalo,SG,29,TOT,78,72,2502,375,884,...,.843,27,220,247,129,41,7,116,167,1035
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
670,490,Thaddeus Young,PF,26,TOT,76,68,2434,451,968,...,.655,127,284,411,173,124,25,117,171,1071
671,490,Thaddeus Young,PF,26,MIN,48,48,1605,289,641,...,.682,75,170,245,135,86,17,75,115,685
672,490,Thaddeus Young,PF,26,BRK,28,20,829,162,327,...,.606,52,114,166,38,38,8,42,56,386
673,491,Cody Zeller,C,22,CHO,62,45,1487,172,373,...,.774,97,265,362,100,34,49,62,156,472


In [40]:
type(d)

pandas.core.frame.DataFrame

In [41]:
d.columns

Index(['Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%',
       '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%',
       'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS'],
      dtype='object')

In [42]:
d.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Quincy Acy,PF,24,NYK,68,22,1287,152,331,...,0.784,79,222,301,68,27,22,60,147,398
1,2,Jordan Adams,SG,20,MEM,30,0,248,35,86,...,0.609,9,19,28,16,16,7,14,24,94
2,3,Steven Adams,C,21,OKC,70,67,1771,217,399,...,0.502,199,324,523,66,38,86,99,222,537
3,4,Jeff Adrien,PF,28,MIN,17,0,215,19,44,...,0.579,23,54,77,15,4,9,9,30,60
4,5,Arron Afflalo,SG,29,TOT,78,72,2502,375,884,...,0.843,27,220,247,129,41,7,116,167,1035


In [47]:
d.dtypes
# object type means it has more then one format of data in it. 

Rk        object
Player    object
Pos       object
Age       object
Tm        object
G         object
GS        object
MP        object
FG        object
FGA       object
FG%       object
3P        object
3PA       object
3P%       object
2P        object
2PA       object
2P%       object
eFG%      object
FT        object
FTA       object
FT%       object
ORB       object
DRB       object
TRB       object
AST       object
STL       object
BLK       object
TOV       object
PF        object
PTS       object
dtype: object

In [44]:
d[["Rk","Age"]]

Unnamed: 0,Rk,Age
0,1,24
1,2,20
2,3,21
3,4,28
4,5,29
...,...,...
670,490,26
671,490,26
672,490,26
673,491,22


In [48]:
# How to store/export the complete data
d.to_csv('NBA.csv')

In [49]:
ls

 Volume in drive C has no label.
 Volume Serial Number is 6A09-6524

 Directory of C:\Users\win\Ineuron Classes

27-10-2023  16:02    <DIR>          .
20-10-2023  23:27    <DIR>          ..
17-10-2023  14:41    <DIR>          .ipynb_checkpoints
20-09-2023  18:18    <DIR>          __pycache__
27-05-2022  11:53             1,173 addresses.csv
22-05-2023  22:22            51,867 car.data
06-12-2022  21:39             5,497 Class 01.ipynb
09-12-2022  22:47            24,947 Class 02.ipynb
20-09-2023  15:38            35,934 Class 03 - Tuple, Set and Dict.ipynb
24-12-2022  00:25            18,225 Class 04 - If, Else & For loop.ipynb
12-03-2023  22:30            17,480 Class 05 - For-else and while loop.ipynb
11-03-2023  22:51            19,426 Class 06 - Loops in details.ipynb
12-03-2023  23:55            25,172 Class 07 - Functions 1.ipynb
20-09-2023  17:44            49,761 Class 08 - Functions 2.ipynb
17-09-2023  02:53            43,389 Class 09 - Iterator Generator and File System.ipynb

In [50]:
"""
The read_html option fr fetching data has limitations, 
such that it will only takes the data 
if the data is available in the tabular format only. 
All the other type of data will be craped. 
"""

'\nThe read_html option fr fetching data has limitations, \nsuch that it will only takes the data \nif the data is available in the tabular format only. \nAll the other type of data will be craped. \n'

In [51]:
# To avoide the auto index creating in the dataframe
# We have to use parameter index while reading the file
d.to_csv('NBA_Without_index.csv', index= None)

In [52]:
# And Simmilar for other operations as well. 

In [53]:
# Practice for reading more data from web

d1 = pd.read_html('https://www.basketball-reference.com/teams/POR/2015.html')
d1

[    No.             Player Pos    Ht   Wt          Birth Date Unnamed: 6 Exp  \
 0     4      Arron Afflalo  SG   6-5  210    October 15, 1985         us   7   
 1    12  LaMarcus Aldridge  PF  6-11  250       July 19, 1985         us   8   
 2     5        Will Barton  SG   6-6  181     January 6, 1991         us   2   
 3    88      Nicolas Batum  SF   6-8  230   December 14, 1988         fr   6   
 4     5        Steve Blake  PG   6-3  172   February 26, 1980         us  11   
 5    18      Victor Claver  SF   6-9  224     August 30, 1988         es   2   
 6    23       Allen Crabbe  SG   6-5  212       April 9, 1992         us   1   
 7    10        Tim Frazier  PG   6-0  170    November 1, 1990         us   R   
 8    19      Joel Freeland   C  6-10  250    February 7, 1987         gb   2   
 9    33         Alonzo Gee  SF   6-6  225        May 29, 1987         us   5   
 10   35        Chris Kaman   C   7-0  265      April 28, 1982         us  11   
 11   11     Meyers Leonard 

In [54]:
len(d1)

7

In [55]:
d1[0]

Unnamed: 0,No.,Player,Pos,Ht,Wt,Birth Date,Unnamed: 6,Exp,College
0,4,Arron Afflalo,SG,6-5,210,"October 15, 1985",us,7,UCLA
1,12,LaMarcus Aldridge,PF,6-11,250,"July 19, 1985",us,8,Texas
2,5,Will Barton,SG,6-6,181,"January 6, 1991",us,2,Memphis
3,88,Nicolas Batum,SF,6-8,230,"December 14, 1988",fr,6,
4,5,Steve Blake,PG,6-3,172,"February 26, 1980",us,11,Maryland
5,18,Victor Claver,SF,6-9,224,"August 30, 1988",es,2,
6,23,Allen Crabbe,SG,6-5,212,"April 9, 1992",us,1,California
7,10,Tim Frazier,PG,6-0,170,"November 1, 1990",us,R,Penn State
8,19,Joel Freeland,C,6-10,250,"February 7, 1987",gb,2,
9,33,Alonzo Gee,SF,6-6,225,"May 29, 1987",us,5,Alabama


In [56]:
d1[2]

Unnamed: 0,Rk,Player,Age,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Nicolas Batum,26,5,5,41.8,4.8,14.0,0.343,2.6,...,0.769,1.2,7.4,8.6,5.2,0.2,0.2,2.4,2.2,14.2
1,2,LaMarcus Aldridge,29,5,5,41.6,7.4,22.4,0.33,0.6,...,0.889,4.2,7.0,11.2,1.8,0.4,2.4,1.6,3.2,21.8
2,3,Damian Lillard,24,5,5,40.2,7.8,19.2,0.406,1.0,...,0.781,0.4,3.6,4.0,4.6,0.4,0.6,2.4,3.0,21.6
3,4,CJ McCollum,23,5,1,33.2,6.4,13.4,0.478,2.2,...,0.769,0.2,3.8,4.0,0.4,1.2,0.2,1.2,2.6,17.0
4,5,Robin Lopez,26,5,5,23.4,1.8,3.0,0.6,0.0,...,1.0,1.8,2.6,4.4,0.6,0.2,1.0,1.0,3.0,5.2
5,6,Meyers Leonard,22,5,0,21.2,2.8,4.2,0.667,2.0,...,0.5,1.2,5.4,6.6,1.0,0.4,0.4,0.6,3.4,7.8
6,7,Arron Afflalo,29,3,3,20.0,0.7,4.0,0.167,0.3,...,0.0,0.0,2.3,2.3,0.7,0.0,0.0,1.0,2.0,1.7
7,8,Allen Crabbe,22,2,1,19.5,2.0,2.5,0.8,1.0,...,,0.0,1.5,1.5,0.5,1.0,0.5,0.5,0.5,5.0
8,9,Chris Kaman,32,3,0,12.3,1.3,2.7,0.5,0.0,...,1.0,1.7,3.0,4.7,1.0,0.0,0.0,1.0,1.7,3.0
9,10,Steve Blake,34,5,0,8.6,0.4,2.2,0.182,0.2,...,1.0,0.0,0.2,0.2,1.6,0.0,0.2,0.4,0.8,1.4


In [57]:
d1[1]

Unnamed: 0,Rk,Player,Age,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Damian Lillard,24,82,82,35.7,7.2,16.6,0.434,2.4,...,0.864,0.6,4.0,4.6,6.2,1.2,0.3,2.7,2.0,21.0
1,2,LaMarcus Aldridge,29,71,71,35.4,9.3,19.9,0.466,0.5,...,0.845,2.5,7.7,10.2,1.7,0.7,1.0,1.7,1.8,23.4
2,3,Wesley Matthews,28,60,60,33.7,5.6,12.5,0.448,2.9,...,0.752,0.6,3.1,3.7,2.3,1.3,0.2,1.4,2.2,15.9
3,4,Nicolas Batum,26,71,71,33.5,3.4,8.5,0.4,1.4,...,0.857,0.9,5.0,5.9,4.8,1.1,0.6,1.9,1.5,9.4
4,5,Arron Afflalo,29,25,19,30.1,3.8,9.1,0.414,1.4,...,0.851,0.2,2.4,2.7,1.1,0.4,0.1,1.3,2.4,10.6
5,6,Robin Lopez,26,59,59,27.8,4.0,7.4,0.535,0.0,...,0.772,3.2,3.5,6.7,0.9,0.3,1.4,1.2,2.1,9.6
6,7,Steve Blake,34,81,0,18.9,1.5,4.0,0.373,1.0,...,0.707,0.2,1.5,1.7,3.6,0.5,0.1,1.3,1.5,4.3
7,8,Chris Kaman,32,74,13,18.9,3.8,7.4,0.515,0.0,...,0.706,2.0,4.5,6.5,0.9,0.2,0.7,1.5,1.9,8.6
8,9,CJ McCollum,23,62,3,15.7,2.6,5.9,0.436,0.9,...,0.699,0.2,1.2,1.5,1.0,0.7,0.1,0.8,1.3,6.8
9,10,Meyers Leonard,22,55,7,15.4,2.3,4.5,0.51,0.9,...,0.938,0.8,3.7,4.5,0.6,0.2,0.3,0.7,2.1,5.9


In [58]:
d1[3]

Unnamed: 0,Rk,Player,Age,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1.0,Damian Lillard,24.0,82,82.0,2925,590,1360,0.434,196,...,0.864,49,329,378,507,97,21,222,164,1720
1,2.0,LaMarcus Aldridge,29.0,71,71.0,2512,659,1415,0.466,37,...,0.845,177,549,726,124,48,68,122,125,1661
2,3.0,Nicolas Batum,26.0,71,71.0,2380,240,600,0.4,100,...,0.857,62,354,416,341,78,40,132,106,664
3,4.0,Wesley Matthews,28.0,60,60.0,2024,337,752,0.448,173,...,0.752,38,184,222,139,77,10,81,132,956
4,5.0,Robin Lopez,26.0,59,59.0,1638,234,437,0.535,0,...,0.772,190,204,394,55,16,84,73,122,566
5,6.0,Steve Blake,34.0,81,0.0,1529,122,327,0.373,77,...,0.707,16,121,137,288,41,5,104,118,350
6,7.0,Chris Kaman,32.0,74,13.0,1398,283,549,0.515,0,...,0.706,149,335,484,65,18,54,108,140,638
7,8.0,CJ McCollum,23.0,62,3.0,973,159,365,0.436,55,...,0.699,14,77,91,64,43,8,48,81,424
8,9.0,Meyers Leonard,22.0,55,7.0,847,125,245,0.51,47,...,0.938,46,204,250,32,10,14,39,113,327
9,10.0,Arron Afflalo,29.0,25,19.0,752,94,227,0.414,36,...,0.851,6,61,67,28,9,2,33,59,264


Converting the json type data in dataframe format

In [69]:
d2 = {
         "_id": 45,
         "name": "Kushagra",
         "email_id": "kushagra@testemail.com",
         "skills": ["Python", "SQL", "Excel", "PowerBI"],
         "previous_company": "Skill-Lync.com",
         "notice_period": [15,30,60]
}

# To load a Json type data
# We have to add that disctionary inside """""" 
# otherwise it will not work

How to Work with Json type Data

In [66]:
import json

In [70]:
result = json.loads(d2)

# Not working because the d2 is not under """"""

TypeError: the JSON object must be str, bytes or bytearray, not dict

In [71]:
d2 ="""{
         "_id": 45,
         "name": "Kushagra",
         "email_id": "kushagra@testemail.com",
         "skills": ["Python", "SQL", "Excel", "PowerBI"],
         "previous_company": "Skill-Lync.com",
         "notice_period": [15,30,60]
}"""

# Also It works only with "" elements for str data
# If we use '' for string it will not work

In [72]:
result = json.loads(d2)

In [73]:
result

{'_id': 45,
 'name': 'Kushagra',
 'email_id': 'kushagra@testemail.com',
 'skills': ['Python', 'SQL', 'Excel', 'PowerBI'],
 'previous_company': 'Skill-Lync.com',
 'notice_period': [15, 30, 60]}

In [74]:
type(result)

dict

In [75]:
# We need to convert that into a dataframe
# But it will not work
# Because the list value we have saved has different number of elements inside.

pd.DataFrame(result)

# That's why the error "All arrays must be of the same length"

ValueError: All arrays must be of the same length

In [None]:
# To fix that we need to save the result/dictionary's 
# all the lists with equal number of elements 

In [77]:
d3 ="""{
         "_id": 45,
         "name": "Kushagra",
         "email_id": "kushagra@testemail.com",
         "skills": ["Python", "SQL", "Excel", "PowerBI"],
         "previous_company": "Skill-Lync.com",
         "Skill proficiency": ["Good","Average","Good","Average"]
}"""

# So here we have 2 values which are lists
# And we made both lists number of elements as 4

In [78]:
result1 = json.loads(d3)
result1

{'_id': 45,
 'name': 'Kushagra',
 'email_id': 'kushagra@testemail.com',
 'skills': ['Python', 'SQL', 'Excel', 'PowerBI'],
 'previous_company': 'Skill-Lync.com',
 'Skill proficiency': ['Good', 'Average', 'Good', 'Average']}

In [79]:
# Now the conversion to dataframe will work

pd.DataFrame(result1)

Unnamed: 0,_id,name,email_id,skills,previous_company,Skill proficiency
0,45,Kushagra,kushagra@testemail.com,Python,Skill-Lync.com,Good
1,45,Kushagra,kushagra@testemail.com,SQL,Skill-Lync.com,Average
2,45,Kushagra,kushagra@testemail.com,Excel,Skill-Lync.com,Good
3,45,Kushagra,kushagra@testemail.com,PowerBI,Skill-Lync.com,Average


In [None]:
# As we can observe from the above json file conversion
# We can see that the 'keys' are converted inro 'headers'

# And the number of rows depends on the length of the list 
# inside the values

# All the elements which are str or int or other single datatype
# are just repeated as is

In [80]:
# to access the json data
# we can use it as list indexing
result1["name"]

'Kushagra'

In [81]:
result1['skills']

# Here you can see it gives the list. 

['Python', 'SQL', 'Excel', 'PowerBI']

In [82]:
# But if we want it as a dataframe option
# It only works with the key which has value as a list

pd.DataFrame(result1['name'])

# This code will give error because the key 'name'
# does not have its value as a list

ValueError: DataFrame constructor not properly called!

In [85]:
pd.DataFrame(result1['skills'])

# And type of that data is a data fram 
# Dispite haveing only one column

Unnamed: 0,0
0,Python
1,SQL
2,Excel
3,PowerBI


In [89]:
d4  = {
    "packetType":"D",
    "data":
    {
        "checkEngineLightFlag":"F",
        "batteryVoltageStableTime":0,
        "batteryVoltageStable":"0",
        "batteryVoltageOff":"12.42",
        "batteryCrankParamTN":"-0.08",
        "batteryCrankParamVN":"0.00",
        "batteryCrankParamTP":"-0.08",
        "batteryCrankParamVP":"0.00",
        "batteryCrankParamTT":"-0.00008",
        "batteryCrankParamV0":"0.00",
        "batteryVoltageMaxOn":"13.05",
        "batteryVoltageMinOn":"12.97",
        "batteryVoltageMaxOff":"12.46",
        "batteryVoltageMinOff":"12.36",
        "batteryVoltageOnAverage":"13.02",
        "engineLoadMax":"84",
        "engineLoadAverage":"39.98",
        "rpmMax":"3487",
        "rpmAverage":"1431.29",
        "gpsSpeedAverage":"21.99",
        "vssMax":"53.44",
        "vssAverage":"23.06",
        "tcuTemperatureMin":"82.40",
        "tcuTemperatureMax":"109.40",
        "tcuTemperatureAverage":"104.87",
        "coolantMin":"158.00",
        "coolantMax":"188.60",
        "coolantAverage":"180.20",
        "packetStartLocal":1508143346000,
        "tripStartLocal":1508143346000,
        "milIndicator":"F",
        "monitorsNotReady":0,
        "imei":"60DF5417",
        "gatewayTs":1515613306592,
        "diagnosticTroubleCodeData":[],
        "diagnosticPidData":[[64768,47,100],[64768,1,517376],[64800,1,262144],[64768,5,125]]
    },
    "header":
    {
        "iwrapVer":"1.9.20",
        "sourceSystem":"CDP",
        "configVer":"1.1",
        "oemName":"HUM",
        "unitType":0,
        "cpVer":"7.50.1.9",
        "igpsVer":"1.3.7",
        "messageType":"Notification",
        "pomVer":"1.0",
        "headerVer":"V6",
        "timestamp":0,
        "deviceType":"InDrive",
        "visorVer":"1.4.35",
        "transactionId":"53098471-7787-4160-94b3-cd69dcc70416",
        "deviceSerialNo":"60DF5417",
        "subOrganization":"HUM",
        "organization":"HUM",
        "imei":"60DF5417",
        "operation":"Notification"
    }
}

In [90]:
pd.DataFrame(d4)

Unnamed: 0,packetType,data,header
checkEngineLightFlag,D,F,
batteryVoltageStableTime,D,0,
batteryVoltageStable,D,0,
batteryVoltageOff,D,12.42,
batteryCrankParamTN,D,-0.08,
batteryCrankParamVN,D,0.00,
batteryCrankParamTP,D,-0.08,
batteryCrankParamVP,D,0.00,
batteryCrankParamTT,D,-0.00008,
batteryCrankParamV0,D,0.00,


In [92]:
d4['data']

# It returs a dictory

{'checkEngineLightFlag': 'F',
 'batteryVoltageStableTime': 0,
 'batteryVoltageStable': '0',
 'batteryVoltageOff': '12.42',
 'batteryCrankParamTN': '-0.08',
 'batteryCrankParamVN': '0.00',
 'batteryCrankParamTP': '-0.08',
 'batteryCrankParamVP': '0.00',
 'batteryCrankParamTT': '-0.00008',
 'batteryCrankParamV0': '0.00',
 'batteryVoltageMaxOn': '13.05',
 'batteryVoltageMinOn': '12.97',
 'batteryVoltageMaxOff': '12.46',
 'batteryVoltageMinOff': '12.36',
 'batteryVoltageOnAverage': '13.02',
 'engineLoadMax': '84',
 'engineLoadAverage': '39.98',
 'rpmMax': '3487',
 'rpmAverage': '1431.29',
 'gpsSpeedAverage': '21.99',
 'vssMax': '53.44',
 'vssAverage': '23.06',
 'tcuTemperatureMin': '82.40',
 'tcuTemperatureMax': '109.40',
 'tcuTemperatureAverage': '104.87',
 'coolantMin': '158.00',
 'coolantMax': '188.60',
 'coolantAverage': '180.20',
 'packetStartLocal': 1508143346000,
 'tripStartLocal': 1508143346000,
 'milIndicator': 'F',
 'monitorsNotReady': 0,
 'imei': '60DF5417',
 'gatewayTs': 151561

In [93]:
# Now here if we try to access the value in 'key' 'data'

pd.DataFrame(d4['data'])

# It gives the error 'All arrays must be of the same length'
# Because it is a dictionary and 
# Few values of this dictionary have list with diferent number of elements

# 'diagnosticTroubleCodeData': [],
# 'diagnosticPidData': [[64768, 47, 100],

ValueError: All arrays must be of the same length

In [95]:
# To Fix thsi we can remove the empty list from this
# To do that we can use 'del' function

d5 = d4['data']
del d5['diagnosticTroubleCodeData']

In [96]:
d5

{'checkEngineLightFlag': 'F',
 'batteryVoltageStableTime': 0,
 'batteryVoltageStable': '0',
 'batteryVoltageOff': '12.42',
 'batteryCrankParamTN': '-0.08',
 'batteryCrankParamVN': '0.00',
 'batteryCrankParamTP': '-0.08',
 'batteryCrankParamVP': '0.00',
 'batteryCrankParamTT': '-0.00008',
 'batteryCrankParamV0': '0.00',
 'batteryVoltageMaxOn': '13.05',
 'batteryVoltageMinOn': '12.97',
 'batteryVoltageMaxOff': '12.46',
 'batteryVoltageMinOff': '12.36',
 'batteryVoltageOnAverage': '13.02',
 'engineLoadMax': '84',
 'engineLoadAverage': '39.98',
 'rpmMax': '3487',
 'rpmAverage': '1431.29',
 'gpsSpeedAverage': '21.99',
 'vssMax': '53.44',
 'vssAverage': '23.06',
 'tcuTemperatureMin': '82.40',
 'tcuTemperatureMax': '109.40',
 'tcuTemperatureAverage': '104.87',
 'coolantMin': '158.00',
 'coolantMax': '188.60',
 'coolantAverage': '180.20',
 'packetStartLocal': 1508143346000,
 'tripStartLocal': 1508143346000,
 'milIndicator': 'F',
 'monitorsNotReady': 0,
 'imei': '60DF5417',
 'gatewayTs': 151561

In [97]:
pd.DataFrame(d5)

Unnamed: 0,checkEngineLightFlag,batteryVoltageStableTime,batteryVoltageStable,batteryVoltageOff,batteryCrankParamTN,batteryCrankParamVN,batteryCrankParamTP,batteryCrankParamVP,batteryCrankParamTT,batteryCrankParamV0,...,coolantMin,coolantMax,coolantAverage,packetStartLocal,tripStartLocal,milIndicator,monitorsNotReady,imei,gatewayTs,diagnosticPidData
0,F,0,0,12.42,-0.08,0.0,-0.08,0.0,-8e-05,0.0,...,158.0,188.6,180.2,1508143346000,1508143346000,F,0,60DF5417,1515613306592,"[64768, 47, 100]"
1,F,0,0,12.42,-0.08,0.0,-0.08,0.0,-8e-05,0.0,...,158.0,188.6,180.2,1508143346000,1508143346000,F,0,60DF5417,1515613306592,"[64768, 1, 517376]"
2,F,0,0,12.42,-0.08,0.0,-0.08,0.0,-8e-05,0.0,...,158.0,188.6,180.2,1508143346000,1508143346000,F,0,60DF5417,1515613306592,"[64800, 1, 262144]"
3,F,0,0,12.42,-0.08,0.0,-0.08,0.0,-8e-05,0.0,...,158.0,188.6,180.2,1508143346000,1508143346000,F,0,60DF5417,1515613306592,"[64768, 5, 125]"


Converting another huge Json into dataframe

In [3]:
import pandas as pd
import json

In [2]:
url = "https://api.github.com/repos/pandas-dev/pandas/issues"

In [4]:
pd.read_json(url)

Unnamed: 0,url,repository_url,labels_url,comments_url,events_url,html_url,id,node_id,number,title,...,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,timeline_url,performed_via_github_app,state_reason
0,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55747,1966718773,PR_kwDOAA0YD85eCPkq,55747,DOC: Fix release date for 2.1.2,...,NaT,MEMBER,,0.0,{'url': 'https://api.github.com/repos/pandas-d...,- [ ] closes #xxxx (Replace xxxx with the GitH...,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
1,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55746,1966716042,PR_kwDOAA0YD85eCPBQ,55746,TST: Make old tests more performant,...,NaT,MEMBER,,0.0,{'url': 'https://api.github.com/repos/pandas-d...,,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
2,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55745,1966623483,PR_kwDOAA0YD85eB9Kd,55745,CoW - don't try to update underlying values of...,...,NaT,MEMBER,,1.0,{'url': 'https://api.github.com/repos/pandas-d...,Related to\r\n\r\n* https://github.com/pandas-...,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
3,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55743,1966563313,PR_kwDOAA0YD85eBxli,55743,Ensure HDFStore read gives column-major data w...,...,NaT,MEMBER,,0.0,{'url': 'https://api.github.com/repos/pandas-d...,Fixing a failing test that turned up in https:...,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
4,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55742,1966537901,PR_kwDOAA0YD85eBsed,55742,BUG: fix nanmedian for CoW without bottleneck,...,NaT,MEMBER,,0.0,{'url': 'https://api.github.com/repos/pandas-d...,Fixing a failing test that turned up in https:...,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
5,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55741,1966349677,PR_kwDOAA0YD85eBFn4,55741,POC/ENH: infer resolution in array_to_datetime,...,NaT,MEMBER,,0.0,{'url': 'https://api.github.com/repos/pandas-d...,xref #55564 implements the relevant logic in a...,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
6,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55740,1966246826,PR_kwDOAA0YD85eAvkd,55740,REF: misplaced formatting tests,...,NaT,MEMBER,,0.0,{'url': 'https://api.github.com/repos/pandas-d...,- [ ] closes #xxxx (Replace xxxx with the GitH...,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
7,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55739,1966235720,PR_kwDOAA0YD85eAtIl,55739,DEPS: Test NEP 50,...,NaT,MEMBER,,1.0,{'url': 'https://api.github.com/repos/pandas-d...,,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
8,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/55738,1966209966,PR_kwDOAA0YD85eAngm,55738,REF: Compute complete result_index upfront in ...,...,NaT,MEMBER,,1.0,{'url': 'https://api.github.com/repos/pandas-d...,- [ ] closes #xxxx (Replace xxxx with the GitH...,{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,
9,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/issues/55737,1966208295,I_kwDOAA0YD851MfUn,55737,BUG: inferred Timestamp unit with dateutil paths,...,NaT,MEMBER,,,,"```python\r\n>>> pd.Timestamp(""10/27/23 14:19:...",{'url': 'https://api.github.com/repos/pandas-d...,https://api.github.com/repos/pandas-dev/pandas...,,


In [None]:
# The above code is creating a dataframe wich has multiple set of data inside one. 
# Not Ideal

# We can try to load the json file as a raw json
# We can achive that with 'requests' module

In [7]:
import requests as rq

In [8]:
data = rq.get(url)
data

# We are calling a data from the 'requests' module

<Response [200]>

In [10]:
# Now Lets try to convert the data into json
data_1 = data.json()
data_1

# Here we have loaded the entire json file as is

[{'url': 'https://api.github.com/repos/pandas-dev/pandas/issues/55747',
  'repository_url': 'https://api.github.com/repos/pandas-dev/pandas',
  'labels_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/55747/labels{/name}',
  'comments_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/55747/comments',
  'events_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/55747/events',
  'html_url': 'https://github.com/pandas-dev/pandas/pull/55747',
  'id': 1966718773,
  'node_id': 'PR_kwDOAA0YD85eCPkq',
  'number': 55747,
  'title': 'DOC: Fix release date for 2.1.2',
  'user': {'login': 'lithomas1',
   'id': 47963215,
   'node_id': 'MDQ6VXNlcjQ3OTYzMjE1',
   'avatar_url': 'https://avatars.githubusercontent.com/u/47963215?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/lithomas1',
   'html_url': 'https://github.com/lithomas1',
   'followers_url': 'https://api.github.com/users/lithomas1/followers',
   'following_url': 'https://api.github.com/use

In [11]:
# To find out how many unique distionary available
len(data_1)

# So there are 30 unique distionaries inthe json file

30

In [13]:
# From the above json file we can extract data just like a normal dict
data_1[0]["user"]['id']

47963215

In [14]:
data_1[1]["user"]['html_url']

'https://github.com/mroeschke'

In [61]:
# For all the 30 'user' and its 'id' 
# We can write a code

for i in range(len(data_1)):
    t = data_1[i]["user"]['id']
    print(t)

47963215
10647082
1020496
1020496
1020496
8078968
8078968
10647082
45562402
8078968
45562402
8078968
1055747
1020496
10142876
20886299
103948996
3132181
3132181
609873
86264395
2199875
72244627
45562402
45562402
26783716
8078968
47963215
8078968
91160475


In [None]:
# Now if we want to have the data of only few selected keys in the 
# Complete json file as an output data set
# We can use the 'colomns' paramter in the 'Dataframe' function



In [90]:
df2 = pd.DataFrame(data_1, columns = ['id','url' , 'repository_url' , 'labels_url','user'])
df2

Unnamed: 0,id,url,repository_url,labels_url,user
0,1966718773,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'lithomas1', 'id': 47963215, 'node_i..."
1,1966716042,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'mroeschke', 'id': 10647082, 'node_i..."
2,1966623483,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jorisvandenbossche', 'id': 1020496,..."
3,1966563313,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jorisvandenbossche', 'id': 1020496,..."
4,1966537901,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jorisvandenbossche', 'id': 1020496,..."
5,1966349677,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jbrockmendel', 'id': 8078968, 'node..."
6,1966246826,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jbrockmendel', 'id': 8078968, 'node..."
7,1966235720,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'mroeschke', 'id': 10647082, 'node_i..."
8,1966209966,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'rhshadrach', 'id': 45562402, 'node_..."
9,1966208295,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jbrockmendel', 'id': 8078968, 'node..."


In [24]:
# Exporting the dataframe as csv file

df2.to_csv('json_res.csv')

In [119]:
df3 = pd.DataFrame(data_1, columns = ['node_id','number','title'])
df3

Unnamed: 0,node_id,number,title
0,PR_kwDOAA0YD85eCPkq,55747,DOC: Fix release date for 2.1.2
1,PR_kwDOAA0YD85eCPBQ,55746,TST: Make old tests more performant
2,PR_kwDOAA0YD85eB9Kd,55745,CoW - don't try to update underlying values of...
3,PR_kwDOAA0YD85eBxli,55743,Ensure HDFStore read gives column-major data w...
4,PR_kwDOAA0YD85eBsed,55742,BUG: fix nanmedian for CoW without bottleneck
5,PR_kwDOAA0YD85eBFn4,55741,POC/ENH: infer resolution in array_to_datetime
6,PR_kwDOAA0YD85eAvkd,55740,REF: misplaced formatting tests
7,PR_kwDOAA0YD85eAtIl,55739,DEPS: Test NEP 50
8,PR_kwDOAA0YD85eAngm,55738,REF: Compute complete result_index upfront in ...
9,I_kwDOAA0YD851MfUn,55737,BUG: inferred Timestamp unit with dateutil paths


In [115]:
# In case you want to make the data from 'user' out 
# and use them as hedders
# We have to create another dataframe
# And then join them together

# To print all the keys of the values present in 'user'
t = data_1[0]["user"].keys()
t = list(t)

# Here we have used the indexing option to access the inner dictionary
# after that making the dictionary a list
# We will be able to run the "DataFrame" command
df4 = pd.DataFrame([data_1[0]['user']], columns = t)

for i in range(1, len(df3)):
    df5 = pd.DataFrame([data_1[i]['user']], columns = t)
    
    # using Concat() finction to join the dataframe vertically
    df4 = pd.concat([df4, df5], axis = 0,ignore_index= True)
    
    # Or we can use the append() function to add entries below
    # df4 = df4.append(df5)
    # this append method might be removed from pandas in future
    # So using concat is ideal
    
df4

Unnamed: 0,login,id,node_id,avatar_url,gravatar_id,url,html_url,followers_url,following_url,gists_url,starred_url,subscriptions_url,organizations_url,repos_url,events_url,received_events_url,type,site_admin
0,lithomas1,47963215,MDQ6VXNlcjQ3OTYzMjE1,https://avatars.githubusercontent.com/u/479632...,,https://api.github.com/users/lithomas1,https://github.com/lithomas1,https://api.github.com/users/lithomas1/followers,https://api.github.com/users/lithomas1/followi...,https://api.github.com/users/lithomas1/gists{/...,https://api.github.com/users/lithomas1/starred...,https://api.github.com/users/lithomas1/subscri...,https://api.github.com/users/lithomas1/orgs,https://api.github.com/users/lithomas1/repos,https://api.github.com/users/lithomas1/events{...,https://api.github.com/users/lithomas1/receive...,User,False
1,mroeschke,10647082,MDQ6VXNlcjEwNjQ3MDgy,https://avatars.githubusercontent.com/u/106470...,,https://api.github.com/users/mroeschke,https://github.com/mroeschke,https://api.github.com/users/mroeschke/followers,https://api.github.com/users/mroeschke/followi...,https://api.github.com/users/mroeschke/gists{/...,https://api.github.com/users/mroeschke/starred...,https://api.github.com/users/mroeschke/subscri...,https://api.github.com/users/mroeschke/orgs,https://api.github.com/users/mroeschke/repos,https://api.github.com/users/mroeschke/events{...,https://api.github.com/users/mroeschke/receive...,User,False
2,jorisvandenbossche,1020496,MDQ6VXNlcjEwMjA0OTY=,https://avatars.githubusercontent.com/u/102049...,,https://api.github.com/users/jorisvandenbossche,https://github.com/jorisvandenbossche,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,User,False
3,jorisvandenbossche,1020496,MDQ6VXNlcjEwMjA0OTY=,https://avatars.githubusercontent.com/u/102049...,,https://api.github.com/users/jorisvandenbossche,https://github.com/jorisvandenbossche,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,User,False
4,jorisvandenbossche,1020496,MDQ6VXNlcjEwMjA0OTY=,https://avatars.githubusercontent.com/u/102049...,,https://api.github.com/users/jorisvandenbossche,https://github.com/jorisvandenbossche,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,User,False
5,jbrockmendel,8078968,MDQ6VXNlcjgwNzg5Njg=,https://avatars.githubusercontent.com/u/807896...,,https://api.github.com/users/jbrockmendel,https://github.com/jbrockmendel,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/gist...,https://api.github.com/users/jbrockmendel/star...,https://api.github.com/users/jbrockmendel/subs...,https://api.github.com/users/jbrockmendel/orgs,https://api.github.com/users/jbrockmendel/repos,https://api.github.com/users/jbrockmendel/even...,https://api.github.com/users/jbrockmendel/rece...,User,False
6,jbrockmendel,8078968,MDQ6VXNlcjgwNzg5Njg=,https://avatars.githubusercontent.com/u/807896...,,https://api.github.com/users/jbrockmendel,https://github.com/jbrockmendel,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/gist...,https://api.github.com/users/jbrockmendel/star...,https://api.github.com/users/jbrockmendel/subs...,https://api.github.com/users/jbrockmendel/orgs,https://api.github.com/users/jbrockmendel/repos,https://api.github.com/users/jbrockmendel/even...,https://api.github.com/users/jbrockmendel/rece...,User,False
7,mroeschke,10647082,MDQ6VXNlcjEwNjQ3MDgy,https://avatars.githubusercontent.com/u/106470...,,https://api.github.com/users/mroeschke,https://github.com/mroeschke,https://api.github.com/users/mroeschke/followers,https://api.github.com/users/mroeschke/followi...,https://api.github.com/users/mroeschke/gists{/...,https://api.github.com/users/mroeschke/starred...,https://api.github.com/users/mroeschke/subscri...,https://api.github.com/users/mroeschke/orgs,https://api.github.com/users/mroeschke/repos,https://api.github.com/users/mroeschke/events{...,https://api.github.com/users/mroeschke/receive...,User,False
8,rhshadrach,45562402,MDQ6VXNlcjQ1NTYyNDAy,https://avatars.githubusercontent.com/u/455624...,,https://api.github.com/users/rhshadrach,https://github.com/rhshadrach,https://api.github.com/users/rhshadrach/followers,https://api.github.com/users/rhshadrach/follow...,https://api.github.com/users/rhshadrach/gists{...,https://api.github.com/users/rhshadrach/starre...,https://api.github.com/users/rhshadrach/subscr...,https://api.github.com/users/rhshadrach/orgs,https://api.github.com/users/rhshadrach/repos,https://api.github.com/users/rhshadrach/events...,https://api.github.com/users/rhshadrach/receiv...,User,False
9,jbrockmendel,8078968,MDQ6VXNlcjgwNzg5Njg=,https://avatars.githubusercontent.com/u/807896...,,https://api.github.com/users/jbrockmendel,https://github.com/jbrockmendel,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/gist...,https://api.github.com/users/jbrockmendel/star...,https://api.github.com/users/jbrockmendel/subs...,https://api.github.com/users/jbrockmendel/orgs,https://api.github.com/users/jbrockmendel/repos,https://api.github.com/users/jbrockmendel/even...,https://api.github.com/users/jbrockmendel/rece...,User,False


In [124]:
# Now joining the earlier datafame with the new one.

df6 = pd.concat([df2, df3], axis = 1)
result2 = df6.join(df4, rsuffix='_user')
result2

Unnamed: 0,id,url,repository_url,labels_url,user,node_id,number,title,login,id_user,...,following_url,gists_url,starred_url,subscriptions_url,organizations_url,repos_url,events_url,received_events_url,type,site_admin
0,1966718773,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'lithomas1', 'id': 47963215, 'node_i...",PR_kwDOAA0YD85eCPkq,55747,DOC: Fix release date for 2.1.2,lithomas1,47963215,...,https://api.github.com/users/lithomas1/followi...,https://api.github.com/users/lithomas1/gists{/...,https://api.github.com/users/lithomas1/starred...,https://api.github.com/users/lithomas1/subscri...,https://api.github.com/users/lithomas1/orgs,https://api.github.com/users/lithomas1/repos,https://api.github.com/users/lithomas1/events{...,https://api.github.com/users/lithomas1/receive...,User,False
1,1966716042,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'mroeschke', 'id': 10647082, 'node_i...",PR_kwDOAA0YD85eCPBQ,55746,TST: Make old tests more performant,mroeschke,10647082,...,https://api.github.com/users/mroeschke/followi...,https://api.github.com/users/mroeschke/gists{/...,https://api.github.com/users/mroeschke/starred...,https://api.github.com/users/mroeschke/subscri...,https://api.github.com/users/mroeschke/orgs,https://api.github.com/users/mroeschke/repos,https://api.github.com/users/mroeschke/events{...,https://api.github.com/users/mroeschke/receive...,User,False
2,1966623483,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jorisvandenbossche', 'id': 1020496,...",PR_kwDOAA0YD85eB9Kd,55745,CoW - don't try to update underlying values of...,jorisvandenbossche,1020496,...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,User,False
3,1966563313,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jorisvandenbossche', 'id': 1020496,...",PR_kwDOAA0YD85eBxli,55743,Ensure HDFStore read gives column-major data w...,jorisvandenbossche,1020496,...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,User,False
4,1966537901,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jorisvandenbossche', 'id': 1020496,...",PR_kwDOAA0YD85eBsed,55742,BUG: fix nanmedian for CoW without bottleneck,jorisvandenbossche,1020496,...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,https://api.github.com/users/jorisvandenbossch...,User,False
5,1966349677,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jbrockmendel', 'id': 8078968, 'node...",PR_kwDOAA0YD85eBFn4,55741,POC/ENH: infer resolution in array_to_datetime,jbrockmendel,8078968,...,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/gist...,https://api.github.com/users/jbrockmendel/star...,https://api.github.com/users/jbrockmendel/subs...,https://api.github.com/users/jbrockmendel/orgs,https://api.github.com/users/jbrockmendel/repos,https://api.github.com/users/jbrockmendel/even...,https://api.github.com/users/jbrockmendel/rece...,User,False
6,1966246826,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jbrockmendel', 'id': 8078968, 'node...",PR_kwDOAA0YD85eAvkd,55740,REF: misplaced formatting tests,jbrockmendel,8078968,...,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/gist...,https://api.github.com/users/jbrockmendel/star...,https://api.github.com/users/jbrockmendel/subs...,https://api.github.com/users/jbrockmendel/orgs,https://api.github.com/users/jbrockmendel/repos,https://api.github.com/users/jbrockmendel/even...,https://api.github.com/users/jbrockmendel/rece...,User,False
7,1966235720,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'mroeschke', 'id': 10647082, 'node_i...",PR_kwDOAA0YD85eAtIl,55739,DEPS: Test NEP 50,mroeschke,10647082,...,https://api.github.com/users/mroeschke/followi...,https://api.github.com/users/mroeschke/gists{/...,https://api.github.com/users/mroeschke/starred...,https://api.github.com/users/mroeschke/subscri...,https://api.github.com/users/mroeschke/orgs,https://api.github.com/users/mroeschke/repos,https://api.github.com/users/mroeschke/events{...,https://api.github.com/users/mroeschke/receive...,User,False
8,1966209966,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'rhshadrach', 'id': 45562402, 'node_...",PR_kwDOAA0YD85eAngm,55738,REF: Compute complete result_index upfront in ...,rhshadrach,45562402,...,https://api.github.com/users/rhshadrach/follow...,https://api.github.com/users/rhshadrach/gists{...,https://api.github.com/users/rhshadrach/starre...,https://api.github.com/users/rhshadrach/subscr...,https://api.github.com/users/rhshadrach/orgs,https://api.github.com/users/rhshadrach/repos,https://api.github.com/users/rhshadrach/events...,https://api.github.com/users/rhshadrach/receiv...,User,False
9,1966208295,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,"{'login': 'jbrockmendel', 'id': 8078968, 'node...",I_kwDOAA0YD851MfUn,55737,BUG: inferred Timestamp unit with dateutil paths,jbrockmendel,8078968,...,https://api.github.com/users/jbrockmendel/foll...,https://api.github.com/users/jbrockmendel/gist...,https://api.github.com/users/jbrockmendel/star...,https://api.github.com/users/jbrockmendel/subs...,https://api.github.com/users/jbrockmendel/orgs,https://api.github.com/users/jbrockmendel/repos,https://api.github.com/users/jbrockmendel/even...,https://api.github.com/users/jbrockmendel/rece...,User,False


In [125]:
# Exporting the dataframe as csv file

result2.to_csv('json_res_nested_dict.csv')

# Data Manupulation