## Types of data 

<img src="https://lawtomated.com/wp-content/uploads/2019/04/structuredVsUnstructuredIgneos.png" width="500">

In this course, we will focus on analyzing structured data.

##  Using Pandas to Read CSVs
[Pandas](https://pandas.pydata.org/) is a popular Python library used for working in tabular data (similar to the data stored in a spreadsheet). Pandas provides helper functions to read data from various file formats like CSV, Excel spreadsheets, HTML tables, JSON, SQL, and more. 

The below format of storing data is known as *comma-separated values* or CSV. It contains day-wise Covid-19 data for Italy:
```
date,new_cases,new_deaths,new_tests
2020-04-21,2256.0,454.0,28095.0
2020-04-22,2729.0,534.0,44248.0
2020-04-23,3370.0,437.0,37083.0
2020-04-24,2646.0,464.0,95273.0
2020-04-25,3021.0,420.0,38676.0
2020-04-26,2357.0,415.0,24113.0
2020-04-27,2324.0,260.0,26678.0
2020-04-28,1739.0,333.0,37554.0
...
```
> **CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)


 First, let's import the Pandas library. As a convention, it is imported with the alias `pd`.

In [2]:
import pandas as pd
import os

### Using the `read_csv` function
The `pd.read_csv` function can be used to read a CSV file into a pandas `DataFrame`: a spreadsheet-like object for analyzing and processing data. 

In [3]:
movie_ratings = pd.read_csv('../data/movie_ratings.csv')

The built-in python function `type` can be used to check the dataype of an object:

In [4]:
type(movie_ratings)

pandas.core.frame.DataFrame

We'll learn more about `DataFrame` in a future lesson.

Note that I use the `relative path` to specify the file path for *movie_ratings.csv*, you may need to change it based on where you store the data file.

### Data Overview
Once the data has been read, we may want to see what the data looks like. We’ll use another Pandas function `head()` to view the first few rows of the data.

In [5]:
movie_ratings.head()

Unnamed: 0,Title,US Gross,Worldwide Gross,Production Budget,Release Date,MPAA Rating,Source,Major Genre,Creative Type,IMDB Rating,IMDB Votes
0,Opal Dreams,14443,14443,9000000,Nov 22 2006,PG/PG-13,Adapted screenplay,Drama,Fiction,6.5,468
1,Major Dundee,14873,14873,3800000,Apr 07 1965,PG/PG-13,Adapted screenplay,Western/Musical,Fiction,6.7,2588
2,The Informers,315000,315000,18000000,Apr 24 2009,R,Adapted screenplay,Horror/Thriller,Fiction,5.2,7595
3,Buffalo Soldiers,353743,353743,15000000,Jul 25 2003,R,Adapted screenplay,Comedy,Fiction,6.9,13510
4,The Last Sin Eater,388390,388390,2200000,Feb 09 2007,PG/PG-13,Adapted screenplay,Drama,Fiction,5.7,1012


**Row Indices and column names (axis labels)**

By default, when you create a pandas DataFrame (or Series) without specifying an index, pandas will automatically assign integer-based row indices starting from 0. These indices serve as the row labels and uniquely identify each row in the DataFrame. For example, the index 2 correponds to the row of the movie The Informers. By default, the indices are integers starting from 0. However, they can be changed (to even non-integer values) if desired by the user.

The bold text on top of the DataFrame refers to column names. For example, the column `US Gross` consists of the gross revenue of a movie in the US.

Collectively, the indices and column names are referred as **axis labels**.

**Basic information**
We can view some basic information about the data frame using the `.info` method.

In [6]:
movie_ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2228 entries, 0 to 2227
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Title              2228 non-null   object 
 1   US Gross           2228 non-null   int64  
 2   Worldwide Gross    2228 non-null   int64  
 3   Production Budget  2228 non-null   int64  
 4   Release Date       2228 non-null   object 
 5   MPAA Rating        2228 non-null   object 
 6   Source             2228 non-null   object 
 7   Major Genre        2228 non-null   object 
 8   Creative Type      2228 non-null   object 
 9   IMDB Rating        2228 non-null   float64
 10  IMDB Votes         2228 non-null   int64  
dtypes: float64(1), int64(4), object(6)
memory usage: 191.6+ KB


The `shape` property of a pandas DataFrame provides a tuple that represents the dimensions of the DataFrame:

* The first value in the tuple is the number of rows.
* The second value in the tuple is the number of columns.


In [7]:
movie_ratings.shape

(2228, 11)

The `columns` property contains the list of columns within the data frame.

In [8]:
movie_ratings.columns

Index(['Title', 'US Gross', 'Worldwide Gross', 'Production Budget',
       'Release Date', 'MPAA Rating', 'Source', 'Major Genre', 'Creative Type',
       'IMDB Rating', 'IMDB Votes'],
      dtype='object')

You can view statistical information for numerical columns (mean, standard deviation, minimum/maximum values, and the number of non-empty values) using the `.describe` method.

In [9]:
movie_ratings.describe()

Unnamed: 0,US Gross,Worldwide Gross,Production Budget,IMDB Rating,IMDB Votes
count,2228.0,2228.0,2228.0,2228.0,2228.0
mean,50763700.0,101937000.0,38160550.0,6.239004,33585.154847
std,66430810.0,164858900.0,37826040.0,1.243285,47325.651561
min,0.0,884.0,218.0,1.4,18.0
25%,9646188.0,13207370.0,12000000.0,5.5,6659.25
50%,28386490.0,42668920.0,26000000.0,6.4,18169.0
75%,64531400.0,120000000.0,53000000.0,7.1,40092.75
max,760167600.0,2767891000.0,300000000.0,9.2,519541.0


Functions & Methods we've looked so far

* `pd.read_csv` - Read data from a CSV file into a Pandas `DataFrame` object
* `.info()` - View basic infomation about rows, columns & data types
* `.shape` - Get the number of rows & columns as a tuple
* `.columns` - Get the list of column names
* `.describe()` - View statistical information about numeric columns

## Data Selection and Filtering

###  Extracting Column(s) from pandas

The first step when working with a DataFrame is often to extract one or more columns. To do this effectively, it’s helpful to understand the internal structure of a DataFrame. Conceptually, you can think of a DataFrame as a dictionary of lists, where the keys are column names, and the values are lists or arrays containing data for the respective columns.

In [10]:
# Pandas format is simliar to this
movie_ratings_dict = {
    'Title':  ['Opal Dreams', 'Major Dundee', 'The Informers', 'Buffalo Soldiers', 'The Last Sin Eater'],
    'US Gross':  [14443, 14873, 315000, 353743, 388390],
    'Worldwide Gross': [14443, 14873, 315000, 353743, 388390],
    'Production Budget': [9000000, 3800000, 18000000, 15000000, 2200000]
}

 For dictionary, we use key to retrive its values

In [11]:
movie_ratings_dict['Title']

['Opal Dreams',
 'Major Dundee',
 'The Informers',
 'Buffalo Soldiers',
 'The Last Sin Eater']

Similar like dictionary, we can extract a column by its column name

In [12]:
movie_ratings['Title']

0                         Opal Dreams
1                        Major Dundee
2                       The Informers
3                    Buffalo Soldiers
4                  The Last Sin Eater
                    ...              
2223                      King Arthur
2224                            Mulan
2225                       Robin Hood
2226    Robin Hood: Prince of Thieves
2227                       Spiceworld
Name: Title, Length: 2228, dtype: object


Each column is a feature of the dataframe, we can also use`.` operator to extract a single column

In [13]:
movie_ratings.Title

0                         Opal Dreams
1                        Major Dundee
2                       The Informers
3                    Buffalo Soldiers
4                  The Last Sin Eater
                    ...              
2223                      King Arthur
2224                            Mulan
2225                       Robin Hood
2226    Robin Hood: Prince of Thieves
2227                       Spiceworld
Name: Title, Length: 2228, dtype: object

When extracting multiple columns, you need to place the column names inside a list.

In [14]:
movie_ratings[['Title', 'US Gross', 'Worldwide Gross' ]]

Unnamed: 0,Title,US Gross,Worldwide Gross
0,Opal Dreams,14443,14443
1,Major Dundee,14873,14873
2,The Informers,315000,315000
3,Buffalo Soldiers,353743,353743
4,The Last Sin Eater,388390,388390
...,...,...,...
2223,King Arthur,51877963,203877963
2224,Mulan,120620254,303500000
2225,Robin Hood,105269730,310885538
2226,Robin Hood: Prince of Thieves,165493908,390500000


### Extracting a sub-set of data: `loc` and `iloc`

Sometimes we may be interested in working with a subset of rows and columns of the data, instead of working with the entire dataset. The indexing operators [loc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) and [iloc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html) provide a convenient way of selecting a subset of desired rows and columns. 

Let us first sort the `movie_ratings` data frame by `IMDB Rating`.

In [16]:
movie_ratings_sorted = movie_ratings.sort_values(by = 'IMDB Rating', ascending = False)
movie_ratings_sorted.head()

Unnamed: 0,Title,US Gross,Worldwide Gross,Production Budget,Release Date,MPAA Rating,Source,Major Genre,Creative Type,IMDB Rating,IMDB Votes,ratio_wgross_by_budget
182,The Shawshank Redemption,28241469,28241469,25000000,Sep 23 1994,R,Adapted screenplay,Drama,Fiction,9.2,519541,1.129659
2084,Inception,285630280,753830280,160000000,Jul 16 2010,PG/PG-13,Original Screenplay,Horror/Thriller,Fiction,9.1,188247,4.711439
2092,Toy Story 3,410640665,1046340665,200000000,Jun 18 2010,G,Original Screenplay,Action/Adventure,Fiction,8.9,67380,5.231703
1962,Pulp Fiction,107928762,212928762,8000000,Oct 14 1994,R,Original Screenplay,Drama,Fiction,8.9,417703,26.616095
790,Schindler's List,96067179,321200000,25000000,Dec 15 1993,R,Adapted screenplay,Drama,Non-Fiction,8.9,276283,12.848


#### Subsetting the DataFrame by `loc`
The operator `loc` uses axis labels (row indices and column names) to subset the data.


Let's subset the `title`, `worldwide gross`, `production budget`, and `IMDB raring` of top 3 movies.

In [17]:
# Subsetting the DataFrame by loc - using axis labels
movies_subset = movie_ratings_sorted.loc[[182,2084, 2092],[ 'Title', 'IMDB Rating', 'US Gross', 'Worldwide Gross', 'Production Budget']]
movies_subset

Unnamed: 0,Title,IMDB Rating,US Gross,Worldwide Gross,Production Budget
182,The Shawshank Redemption,9.2,28241469,28241469,25000000
2084,Inception,9.1,285630280,753830280,160000000
2092,Toy Story 3,8.9,410640665,1046340665,200000000


The `:` symbol in `.loc` and `.iloc` is a slicing operator that represents a range or all elements in the specified dimension (rows or columns). Use `:` alone to select all rows/columns, or with start/end points to slice specific parts of the DataFrame.

In [18]:
# Subsetting the DataFrame by loc - using axis labels. the colon is used to select all rows
movies_subset = movie_ratings_sorted.loc[:,['Title','Worldwide Gross','Production Budget','IMDB Rating']]
movies_subset

Unnamed: 0,Title,Worldwide Gross,Production Budget,IMDB Rating
182,The Shawshank Redemption,28241469,25000000,9.2
2084,Inception,753830280,160000000,9.1
2092,Toy Story 3,1046340665,200000000,8.9
1962,Pulp Fiction,212928762,8000000,8.9
790,Schindler's List,321200000,25000000,8.9
...,...,...,...,...
516,Son of the Mask,59918422,100000000,2.0
1495,Disaster Movie,34690901,20000000,1.7
1116,Crossover,7009668,5600000,1.7
805,From Justin to Kelly,4922166,12000000,1.6


In [19]:
# Subsetting the DataFrame by loc - using axis labels. the colon is used to select a range of rows
movies_subset = movie_ratings_sorted.loc[182:561,['Title','Worldwide Gross','Production Budget','IMDB Rating']]
movies_subset


Unnamed: 0,Title,Worldwide Gross,Production Budget,IMDB Rating
182,The Shawshank Redemption,28241469,25000000,9.2
2084,Inception,753830280,160000000,9.1
2092,Toy Story 3,1046340665,200000000,8.9
1962,Pulp Fiction,212928762,8000000,8.9
790,Schindler's List,321200000,25000000,8.9
561,The Dark Knight,1022345358,185000000,8.9


#### Subsetting the DataFrame by `iloc`

while `iloc` uses the position of rows or columns, where position has values 0,1,2,3,…and so on, for rows from top to bottom and columns from left to right. In other words, the first row has position 0, the second row has position 1, the third row has position 2, and so on. Similarly, the first column from left has position 0, the second column from left has position 1, the third column from left has position 2, and so on.

In [71]:
movie_ratings_sorted.head()

Unnamed: 0,Title,US Gross,Worldwide Gross,Production Budget,Release Date,MPAA Rating,Source,Major Genre,Creative Type,IMDB Rating,IMDB Votes
182,The Shawshank Redemption,28241469,28241469,25000000,Sep 23 1994,R,Adapted screenplay,Drama,Fiction,9.2,519541
2084,Inception,285630280,753830280,160000000,Jul 16 2010,PG/PG-13,Original Screenplay,Horror/Thriller,Fiction,9.1,188247
2092,Toy Story 3,410640665,1046340665,200000000,Jun 18 2010,G,Original Screenplay,Action/Adventure,Fiction,8.9,67380
1962,Pulp Fiction,107928762,212928762,8000000,Oct 14 1994,R,Original Screenplay,Drama,Fiction,8.9,417703
790,Schindler's List,96067179,321200000,25000000,Dec 15 1993,R,Adapted screenplay,Drama,Non-Fiction,8.9,276283


In [20]:
movie_ratings_sorted.iloc[0:3,[0,2,3,9]]

Unnamed: 0,Title,Worldwide Gross,Production Budget,IMDB Rating
182,The Shawshank Redemption,28241469,25000000,9.2
2084,Inception,753830280,160000000,9.1
2092,Toy Story 3,1046340665,200000000,8.9


In [21]:
# Subsetting the DataFrame by iloc - using index of the position of rows and columns
movies_iloc_subset = movie_ratings_sorted.iloc[182:561,[0,2,3,9]]
movies_iloc_subset

Unnamed: 0,Title,Worldwide Gross,Production Budget,IMDB Rating
227,The Boy in the Striped Pyjamas,39830581,12500000,7.8
1463,Lage Raho Munnabhai,31517561,2700000,7.8
363,Coraline,124062750,60000000,7.8
1628,Lucky Number Slevin,55495466,27000000,7.8
1418,Dark City,27257061,27000000,7.8
...,...,...,...,...
1720,Coach Carter,76669806,45000000,7.1
249,Little Women,50003303,15000000,7.1
1752,Drag Me To Hell,85724728,30000000,7.1
1150,Black Snake Moan,9396870,15000000,7.1


Why `iloc` returns different rows?

In [22]:
# Subsetting the DataFrame by iloc - using index of the position of rows and columns
movies_iloc_subset1 = movie_ratings_sorted.iloc[0:10,[0,2,3,9]]
movies_iloc_subset1

Unnamed: 0,Title,Worldwide Gross,Production Budget,IMDB Rating
182,The Shawshank Redemption,28241469,25000000,9.2
2084,Inception,753830280,160000000,9.1
2092,Toy Story 3,1046340665,200000000,8.9
1962,Pulp Fiction,212928762,8000000,8.9
790,Schindler's List,321200000,25000000,8.9
561,The Dark Knight,1022345358,185000000,8.9
184,Cidade de Deus,28763397,3300000,8.8
487,The Lord of the Rings: The Fellowship of the Ring,868621686,109000000,8.8
497,The Lord of the Rings: The Return of the King,1133027325,94000000,8.8
1081,C'era una volta il West,5321508,5000000,8.8


#### Key differences between`loc` and `iloc` in pandas

* **Indexing Type:**

    * loc uses labels (names) for indexing.
    * iloc uses integer positions for indexing.
* **Inclusion of Endpoints:**

    * In a loc slice, both endpoints are included.
    * In an iloc slice, the endpoint is excluded.

Example:

In [79]:
# Assuming you have a DataFrame like this:
import pandas as pd

data = {'A': [1, 2, 3, 4, 5],
        'B': [10, 20, 30, 40, 50],
        'C': [100, 200, 300, 400, 500]}

df = pd.DataFrame(data, index=['row1', 'row2', 'row3', 'row4', 'row5'])
df

Unnamed: 0,A,B,C
row1,1,10,100
row2,2,20,200
row3,3,30,300
row4,4,40,400
row5,5,50,500


In [23]:
# using 'loc'
df.loc['row2':'row4', 'B']

row2    20
row3    30
row4    40
Name: B, dtype: int64

In [24]:

# using 'iloc'
df.iloc[1:4, 1]

row2    20
row3    30
row4    40
Name: B, dtype: int64

Note that in the `loc` example, both 'row2' and 'row4' are included in the result, whereas in the `iloc` example, the row at position 4 is excluded. 

### Extracting rows based on a Single Condition or Multiple Conditions

In many cases, we need to filter data based on specific conditions or a combination of multiple conditions. Next, let’s explore how to use these conditions effectively to extract rows that meet our criteria, whether it's a single condition or multiple conditions combined

In [24]:
# extracting the rows that have IMDB Rating greater than 8
movie_ratings[movie_ratings['IMDB Rating'] > 8]


Unnamed: 0,Title,US Gross,Worldwide Gross,Production Budget,Release Date,MPAA Rating,Source,Major Genre,Creative Type,IMDB Rating,IMDB Votes,ratio_wgross_by_budget
21,"Gandhi, My Father",240425,1375194,5000000,Aug 03 2007,Other,Adapted screenplay,Drama,Non-Fiction,8.1,50881,0.275039
56,Ed Wood,5828466,5828466,18000000,Sep 30 1994,R,Adapted screenplay,Comedy,Non-Fiction,8.1,74171,0.323804
67,Requiem for a Dream,3635482,7390108,4500000,Oct 06 2000,Other,Adapted screenplay,Drama,Fiction,8.5,185226,1.642246
164,Trainspotting,16501785,24000785,3100000,Jul 19 1996,R,Adapted screenplay,Drama,Fiction,8.2,150483,7.742189
181,The Wizard of Oz,28202232,28202232,2777000,Aug 25 2039,G,Adapted screenplay,Western/Musical,Fiction,8.3,102795,10.155647
...,...,...,...,...,...,...,...,...,...,...,...,...
2090,Finding Nemo,339714978,867894287,94000000,May 30 2003,G,Original Screenplay,Action/Adventure,Fiction,8.2,165006,9.232918
2092,Toy Story 3,410640665,1046340665,200000000,Jun 18 2010,G,Original Screenplay,Action/Adventure,Fiction,8.9,67380,5.231703
2094,Avatar,760167650,2767891499,237000000,Dec 18 2009,PG/PG-13,Original Screenplay,Action/Adventure,Fiction,8.3,261439,11.678867
2130,Scarface,44942821,44942821,25000000,Dec 09 1983,Other,Adapted screenplay,Drama,Fiction,8.2,152262,1.797713


To combine multiple conditions in pandas, you need to use the `&` (AND) and `|` (OR) operators. Make sure to enclose each condition in parentheses () for clarity and to ensure proper evaluation order.

In [25]:
# extracting the rows that have IMDB Rating greater than 8 and US Gross less than 1000000
movie_ratings[(movie_ratings['IMDB Rating'] > 8) & (movie_ratings['US Gross'] < 1000000)]

Unnamed: 0,Title,US Gross,Worldwide Gross,Production Budget,Release Date,MPAA Rating,Source,Major Genre,Creative Type,IMDB Rating,IMDB Votes,ratio_wgross_by_budget
21,"Gandhi, My Father",240425,1375194,5000000,Aug 03 2007,Other,Adapted screenplay,Drama,Non-Fiction,8.1,50881,0.275039
636,Lake of Fire,25317,25317,6000000,Oct 03 2007,Other,Adapted screenplay,Documentary,Non-Fiction,8.4,1027,0.00422


**Combining `.loc` with condition(s)** to extract specific rows and columns based on criteria

In [26]:
# extracting the rows that have IMDB Rating greater than 8 or US Gross less than 1000000, only extract the Title and IMDB Rating columns
movie_ratings[(movie_ratings['IMDB Rating'] > 8) & (movie_ratings['US Gross'] < 1000000)][['Title','IMDB Rating']]

#using loc to extract the rows that have IMDB Rating greater than 8 or US Gross less than 1000000, only extract the Title and IMDB Rating columns
movie_ratings.loc[(movie_ratings['IMDB Rating'] > 8) & (movie_ratings['US Gross'] < 1000000),['Title','IMDB Rating']]

Unnamed: 0,Title,IMDB Rating
21,"Gandhi, My Father",8.1
636,Lake of Fire,8.4


Can you use `.iloc` for conditional filtering, why or why not?

### Finding minimum/maximum of a column
When working with pandas, there are two main options for locating the minimum or maximum values in a DataFrame column: 

* `idxmin()` and `idxmax()`: return the **index label** of the first occurrence of the maximum or minimum value in a specified column.

In [65]:
# movie_ratings_sorted.iloc[position_max_wgross,:]
max_index = movie_ratings_sorted['Worldwide Gross'].idxmax()
min_index = movie_ratings_sorted['Worldwide Gross'].idxmin()
print("Max index: ", max_index)
print("Min index: ", min_index)

Max index:  2094
Min index:  896


`idxmin()` and `idxmax()` return the index label of the minimum or maximum value in a column.
You can use these returned index labels with `.loc` to extract the corresponding row.

In [66]:
print(movie_ratings_sorted.loc[max_index,'Worldwide Gross'])
print(movie_ratings_sorted.loc[min_index,'Worldwide Gross'])

2767891499
884


* `argmax()` and `argmin()`: Return the **integer position** of the first occurrence of the maximum or minimum value in a column. You can use these integer positions with `.iloc` to extract the corresponding row



In [69]:
# using argmax and argmin, which return the index of the maximum and minimum values
max_position = movie_ratings_sorted['Worldwide Gross'].argmax()
min_position = movie_ratings_sorted['Worldwide Gross'].argmin()
print("max position:", max_position)
print("min position:", min_position)

# using iloc to get the row with the maximum and minimum values
print(movie_ratings_sorted.iloc[max_position, 2])
print(movie_ratings_sorted.iloc[min_position, 2])

max position: 43
min position: 2146
2767891499
884


**Additional Tips:**
* If you are dealing with non-unique or non-default indices, prefer using `idxmax()/idxmin()` to get the index labels, as `argmax()` might be less intuitive in such cases.
* For DataFrames, consider using `.idxmax(axis=1)` or `.idxmin(axis=1)` to find the max/min index labels along rows instead of columns.

## DataType and DataType Conversion

In [18]:
movie_ratings.dtypes

Title                 object
US Gross               int64
Worldwide Gross        int64
Production Budget      int64
Release Date          object
MPAA Rating           object
Source                object
Major Genre           object
Creative Type         object
IMDB Rating          float64
IMDB Votes             int64
dtype: object

The `dtypes` property is used to find the dtypes in the DataFrame.

This returns a Series with the data type of each column.

<img src="https://www.w3resource.com/w3r_images/pandas-dataframe-dtypes-1.png" width="500">

While it's common for columns containing strings to have the `object` data type, it can also include other types such as lists, dictionaries, or even mixed types within the same column. The `object` data type is a catch-all for columns that contain mixed types or types that aren't easily categorized.

### Available Data Types and Associated Built-in Functions

In a DataFrame, columns can have different data types. Here are the common data types you'll encounter and some built-in functions associated with each type:

1. **Numerical Data (int, float)**
   - Built-in functions: `mean()`, `sum()`, `min()`, `max()`, `std()`, `median()`, `quantile()`, etc.

2. **Object Data (str or mixed types)**
   - Built-in functions: `str.contains()`, `str.startswith()`, `str.endswith()`, `str.lower()`, `str.upper()`, `str.replace()`, etc.

3. **Datetime Data (datetime64)**
   - Built-in functions: `dt.year`, `dt.month`, `dt.day`, `dt.strftime()`, `dt.weekday()`, `dt.hour`, etc.

These functions help in exploring and transforming the data effectively depending on the type of data in each column.

###  Data Type Conversion
  
   When you work on a specific column, being mindful of which data type it is, the data type depends on its built in function.

   Often, we need to convert the datatypes of some of the columns to make them suitable for analysis. For example, the datatype of Release Date in the DataFrame movie_ratings is object. To perform datetime related computations on this variable, we’ll need to convert it to a datatime format. We’ll use the Pandas function `to_datatime()` to covert it to a datatime format. Similar functions such as `to_numeric()`, `to_string()` etc., can be used for other conversions.

In [19]:
movie_ratings['Release Date']

0       Nov 22 2006
1       Apr 07 1965
2       Apr 24 2009
3       Jul 25 2003
4       Feb 09 2007
           ...     
2223    Jul 07 2004
2224    Jun 19 1998
2225    May 14 2010
2226    Jun 14 1991
2227    Jan 23 1998
Name: Release Date, Length: 2228, dtype: object

In [29]:
# check the datatype of release data column 
movie_ratings['Release Date'].dtypes

dtype('O')

We can see above that the function `to_datetime()` converts Release Date to a `datetime` format.

Next, we’ll update the variable `Release Date` in the DataFrame to be in the `datetime` format:

In [75]:
movie_ratings['Release Date'] = pd.to_datetime(movie_ratings['Release Date'])

In [76]:
# Let's check the datatype of release data column again
movie_ratings['Release Date'].dtypes

dtype('<M8[ns]')

`dtype('<M8[ns]')` means a 64-bit datetime object with nanosecond precision stored in little-endian format. This data type is commonly used to represent timestamps in high-resolution time series data.

Next, we can use the built-in datetime functions to extract the year from this variable and create the 'release year' column.

In [77]:
# Extracting the year from the release date
movie_ratings['Release Year'] = movie_ratings['Release Date'].dt.year
movie_ratings.head()

Unnamed: 0,Title,US Gross,Worldwide Gross,Production Budget,Release Date,MPAA Rating,Source,Major Genre,Creative Type,IMDB Rating,IMDB Votes,Release Year,ratio_wgross_by_budget
0,Opal Dreams,14443,14443,9000000,2006-11-22,PG/PG-13,Adapted screenplay,Drama,Fiction,6.5,468,2006,0.001605
1,Major Dundee,14873,14873,3800000,1965-04-07,PG/PG-13,Adapted screenplay,Western/Musical,Fiction,6.7,2588,1965,0.003914
2,The Informers,315000,315000,18000000,2009-04-24,R,Adapted screenplay,Horror/Thriller,Fiction,5.2,7595,2009,0.0175
3,Buffalo Soldiers,353743,353743,15000000,2003-07-25,R,Adapted screenplay,Comedy,Fiction,6.9,13510,2003,0.023583
4,The Last Sin Eater,388390,388390,2200000,2007-02-09,PG/PG-13,Adapted screenplay,Drama,Fiction,5.7,1012,2007,0.176541


In Pandas, the `errors='coerce'` parameter is often used in the context of data conversion, specifically when using the `pd.to_numeric` function. This argument tells Pandas to convert values that it can and set the ones it cannot convert to `NaN`. It's a way of gracefully handling errors without raising an exception. Read the textbook for an example

### Data Type Filtering
We can filter the columns based on its data types

In [71]:

# select just object columns
movie_ratings.select_dtypes(include='object').head()


Unnamed: 0,Title,MPAA Rating,Source,Major Genre,Creative Type
0,Opal Dreams,PG/PG-13,Adapted screenplay,Drama,Fiction
1,Major Dundee,PG/PG-13,Adapted screenplay,Western/Musical,Fiction
2,The Informers,R,Adapted screenplay,Horror/Thriller,Fiction
3,Buffalo Soldiers,R,Adapted screenplay,Comedy,Fiction
4,The Last Sin Eater,PG/PG-13,Adapted screenplay,Drama,Fiction


In [72]:
# select the numeric columns
movie_ratings.select_dtypes(include='number').head()

Unnamed: 0,US Gross,Worldwide Gross,Production Budget,IMDB Rating,IMDB Votes,Release Year,ratio_wgross_by_budget
0,14443,14443,9000000,6.5,468,2006,0.001605
1,14873,14873,3800000,6.7,2588,1965,0.003914
2,315000,315000,18000000,5.2,7595,2009,0.0175
3,353743,353743,15000000,6.9,13510,2003,0.023583
4,388390,388390,2200000,5.7,1012,2007,0.176541



### Summary statistics across rows/columns in Pandas: Numeric Columns

The Pandas DataFrame class has functions such as `sum()` and `mean()` to compute sum over rows or columns of a DataFrame.

By default, functions like `mean()` and `sum()` compute the statistics for each column (i.e., all rows are aggregated) in the DataFrame.
Let us compute the mean of all the numeric columns of the data:

In [83]:
movie_ratings.describe()

Unnamed: 0,US Gross,Worldwide Gross,Production Budget,IMDB Rating,IMDB Votes
count,2228.0,2228.0,2228.0,2228.0,2228.0
mean,50763700.0,101937000.0,38160550.0,6.239004,33585.154847
std,66430810.0,164858900.0,37826040.0,1.243285,47325.651561
min,0.0,884.0,218.0,1.4,18.0
25%,9646188.0,13207370.0,12000000.0,5.5,6659.25
50%,28386490.0,42668920.0,26000000.0,6.4,18169.0
75%,64531400.0,120000000.0,53000000.0,7.1,40092.75
max,760167600.0,2767891000.0,300000000.0,9.2,519541.0


In [84]:
# select the numeric columns
movie_ratings.mean(numeric_only=True)

US Gross                  5.076370e+07
Worldwide Gross           1.019370e+08
Production Budget         3.816055e+07
IMDB Rating               6.239004e+00
IMDB Votes                3.358515e+04
Release Year              2.002005e+03
ratio_wgross_by_budget    1.259483e+01
dtype: float64

 **Using the `axis` parameter**:

The `axis` parameter controls whether to compute the statistic across rows or columns:
* The argument `axis=0`(deafult) denotes that the mean is taken over all the rows of the DataFrame. 
* For computing a statistic across column the argument `axis=1` will be used.

If mean over a subset of columns is desired, then those column names can be subset from the data. 

For example, let us compute the mean IMDB rating, and mean IMDB votes of all the movies:


In [85]:
movie_ratings[['IMDB Rating','IMDB Votes']].mean(axis = 0)

IMDB Rating        6.239004
IMDB Votes     33585.154847
dtype: float64

 **Pandas `sum`  function**

In [86]:
data = [[10, 18, 11], [13, 15, 8], [9, 20, 3]]
df = pd.DataFrame(data )
df

Unnamed: 0,0,1,2
0,10,18,11
1,13,15,8
2,9,20,3


In [87]:
# By default, the sum method adds values accross rows and returns the sum for each column
df.sum()

0    32
1    53
2    22
dtype: int64

In [88]:
# By specifying the column axis (axis='columns'), the sum() method add values accross columns and returns the sum of each row.
df.sum(axis = 'columns')

0    39
1    36
2    32
dtype: int64

In [89]:
# in python, axis=1 stands for column, while axis=0 stands for rows
df.sum(axis = 1)

0    39
1    36
2    32
dtype: int64

## Writing data to a `.csv` file

The Pandas function `to_csv` can be used to write (or export) data to a csv.  Below is an  example.

In [27]:
#Exporting the data of the top 250 movies to a csv file
movie_ratings.to_csv('../data/movie_rating_exported.csv')

In [28]:
# check if the file has been exported
os.listdir('../data')

['bestseller_books.txt',
 'country-capital-lat-long-population.csv',
 'covid.csv',
 'fifa_data.csv',
 'food_quantity.csv',
 'gas_prices.csv',
 'gdp_lifeExpectancy.csv',
 'LOTR 2.csv',
 'LOTR.csv',
 'movies.csv',
 'movies_cleaned.csv',
 'movie_ratings.csv',
 'movie_ratings.txt',
 'movie_rating_exported.csv',
 'party_nyc.csv',
 'price.csv',
 'question_json_data.json',
 'spotify_data.csv',
 'stocks.csv']

##  Reading other data formats - txt, html, json

Although `.csv` is a very popular format for structured data, data is found in several other formats as well. Some of the other data formats are `.txt`, `.html` and `.json`.

### Reading `.txt` files

The *txt* format offers some additional flexibility as compared to the *csv* format. In the *csv* format, the delimiter is a comma (or the column values are separated by a comma). However, in a *txt* file, the delimiter can be anything as desired by the user. Let us read the file *movie_ratings.txt*, where the variable values are separated by a tab character.

In [29]:
#| eval: false
movie_ratings_txt = pd.read_csv('../data/movie_ratings.txt',sep='\t')
movie_ratings_txt.head()

Unnamed: 0.1,Unnamed: 0,Title,US Gross,Worldwide Gross,Production Budget,Release Date,MPAA Rating,Source,Major Genre,Creative Type,IMDB Rating,IMDB Votes
0,0,Opal Dreams,14443,14443,9000000,Nov 22 2006,PG/PG-13,Adapted screenplay,Drama,Fiction,6.5,468
1,1,Major Dundee,14873,14873,3800000,Apr 07 1965,PG/PG-13,Adapted screenplay,Western/Musical,Fiction,6.7,2588
2,2,The Informers,315000,315000,18000000,Apr 24 2009,R,Adapted screenplay,Horror/Thriller,Fiction,5.2,7595
3,3,Buffalo Soldiers,353743,353743,15000000,Jul 25 2003,R,Adapted screenplay,Comedy,Fiction,6.9,13510
4,4,The Last Sin Eater,388390,388390,2200000,Feb 09 2007,PG/PG-13,Adapted screenplay,Drama,Fiction,5.7,1012


We use the function *read_csv* to read a *txt* file. However, we mention the tab character (r"\t") as a separator of variable values.

Note that there is no need to remember the argument name - *sep* for specifying the delimiter. You can always refer to the [read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) documentation to find the relevant argument.

### Practice exercise 4

Read the file *bestseller_books.txt*. It contains top 50 best-selling books on amazon from 2009 to 2019. Identify the delimiter without opening the file with Notepad or a text-editing software. How many rows and columns are there in the dataset?

**Solution:**

In [31]:
#The delimiter seems to be ';' based on the output of the above code
bestseller_books = pd.read_csv('../Data/bestseller_books.txt',sep=';')
bestseller_books.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,0,0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,1,1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,2,2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,3,3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,4,4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


In [None]:
#The file read with ';' as the delimited is correct
print("The file has",bestseller_books.shape[0],"rows and",bestseller_books.shape[1],"columns")

The file has 550 rows and 9 columns


Alternatively, you can use the argument `sep = None`, and `engine = 'python'`. The default engine is C. However, the 'python' engine has a 'sniffer' tool which may identify the delimiter automatically.

In [32]:
bestseller_books = pd.read_csv('../data/bestseller_books.txt',sep=None, engine = 'python')
bestseller_books.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,0,0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,1,1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,2,2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,3,3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,4,4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


### Reading HTML data

The *Pandas* function *read_html* searches for tabular data, i.e., data contained within the *\<table\>* tags of an html file. Let us read the tables in the GDP per capita [page](https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita) on Wikipedia.

In [33]:
#Reading all the tables from the Wikipedia page on GDP per capita
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita')

All the tables will be read and stored in the variable named as *tables*. Let us find the datatype of the variable *tables*.

In [34]:
#Finidng datatype of the variable - tables
type(tables)

list

The variable - tables is a list of all the tables read from the HTML data.

In [35]:
#Number of tables read from the page
len(tables)

6

The in-built function *len* can be used to find the length of the list - *tables* or the number of tables read from the Wikipedia page. Let us check out the first table.

In [36]:
#Checking out the first table. Note that the index of the first table will be 0.
tables[0]

Unnamed: 0,0,1,2
0,">$60,000 $50,000–$60,000 $40,000–$50,000 $30,0...","$20,000–$30,000 $10,000–$20,000 $5,000–$10,000...","$1,000–$2,500 $500–$1,000 <$500 No data"


The above table doesn't seem to be useful. Let us check out the second table.

In [37]:
#Checking out the second table. Note that the index of the first table will be 1.
tables[1]

Unnamed: 0_level_0,Country/Territory,IMF[4][5],IMF[4][5],World Bank[6],World Bank[6],United Nations[7],United Nations[7]
Unnamed: 0_level_1,Country/Territory,Estimate,Year,Estimate,Year,Estimate,Year
0,Monaco,—,—,240862,2022,234317,2021
1,Liechtenstein,—,—,187267,2022,169260,2021
2,Luxembourg,131384,2024,128259,2023,133745,2021
3,Bermuda,—,—,123091,2022,112653,2021
4,Ireland,106059,2024,103685,2023,101109,2021
...,...,...,...,...,...,...,...
218,Malawi,481,2024,673,2023,613,2021
219,South Sudan,422,2024,1072,2015,400,2021
220,Afghanistan,422,2022,353,2022,373,2021
221,Syria,—,—,421,2021,925,2021


The above table contains the estimated GDP per capita of all countries. This is the table that is likely to be relevant to a user interested in analyzing GDP per capita of countries. Instead of reading all tables of an HTML file, we can focus the search to tables containing certain relevant keywords. Let us try searching all table containing the word 'Country'.

In [38]:
#Reading all the tables from the Wikipedia page on GDP per capita, containing the word 'Country'
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita', match = 'Country')

The *match* argument can be used to specify the keywords to be present in the table to be read.

In [39]:
len(tables)

1

Only one table contains the keyword - 'Country'. Let us check out the table obtained.

In [40]:
#Table having the keyword - 'Country' from the HTML page
tables[0]

Unnamed: 0_level_0,Country/Territory,IMF[4][5],IMF[4][5],World Bank[6],World Bank[6],United Nations[7],United Nations[7]
Unnamed: 0_level_1,Country/Territory,Estimate,Year,Estimate,Year,Estimate,Year
0,Monaco,—,—,240862,2022,234317,2021
1,Liechtenstein,—,—,187267,2022,169260,2021
2,Luxembourg,131384,2024,128259,2023,133745,2021
3,Bermuda,—,—,123091,2022,112653,2021
4,Ireland,106059,2024,103685,2023,101109,2021
...,...,...,...,...,...,...,...
218,Malawi,481,2024,673,2023,613,2021
219,South Sudan,422,2024,1072,2015,400,2021
220,Afghanistan,422,2022,353,2022,373,2021
221,Syria,—,—,421,2021,925,2021


The argument *match* helps with a more focussed search, and helps us discard irrelevant tables.

### Practice exercise 5

Read the table(s) consisting of attendance of spectators in FIFA worlds cup from this [page](https://en.wikipedia.org/wiki/FIFA_World_Cup). Read only those table(s) that have the word *'attendance'* in them. How many rows and columns are there in the table(s)?

In [41]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/FIFA_World_Cup',
                       match='reaching')
print(len(dfs))
data = dfs[0]
print("Number of rows =",data.shape[0], "and number of columns=",data.shape[1])
data.head()

1
Number of rows = 25 and number of columns= 6


Unnamed: 0,Team,Titles,Runners-up,Third place,Fourth place,Top 4 total
0,Brazil,"5 (1958, 1962, 1970, 1994, 2002)","2 (1950 *, 1998)","2 (1938, 1978)","2 (1974, 2014 *)",11
1,Germany1,"4 (1954, 1974 *, 1990, 2014)","4 (1966, 1982, 1986, 2002)","4 (1934, 1970, 2006 *, 2010)",1 (1958),13
2,Italy,"4 (1934 *, 1938, 1982, 2006)","2 (1970, 1994)",1 (1990 *),1 (1978),8
3,Argentina,"3 (1978 *, 1986, 2022)","3 (1930, 1990, 2014)",,,6
4,France,"2 (1998 *, 2018)","2 (2006, 2022)","2 (1958, 1986)",1 (1982),7


### Reading JSON data

JSON stands for JavaScript Object Notation, in which the data is stored and transmitted as plain text. A couple of benefits of the JSON format are:

1. Since the format is text only, JSON data can easily be exchanged between web applications, and used by any programming language. 

2. Unlike the *csv* format, JSON supports a hierarchical data structure, and is easier to integrate with APIs. 

The JSON format can support a hierachical data structure, as it is built on the following two data structures (*Source: [technical documentation](https://www.json.org/json-en.html)*):

- A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
- An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.

The *Pandas* function [read_json](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) converts a JSON string to a Pandas DataFrame. The function *dumps()* of the *json* library converts a Python object to a JSON string.

Lets read the JSON data on Ted Talks.

In [42]:
tedtalks_data = pd.read_json('https://raw.githubusercontent.com/cwkenwaysun/TEDmap/master/data/TED_Talks.json')

In [43]:
tedtalks_data.head()

Unnamed: 0,id,speaker,headline,URL,description,transcript_URL,month_filmed,year_filmed,event,duration,date_published,tags,newURL,date,views,rates
0,7,David Pogue,Simplicity sells,http://www.ted.com/talks/view/id/7,New York Times columnist David Pogue takes aim...,http://www.ted.com/talks/view/id/7/transcript?...,2,2006,TED2006,0:21:26,6/27/06,"simplicity,computers,software,interface design...",https://www.ted.com/talks/david_pogue_says_sim...,2006-06-27,1646773,"[{'id': 7, 'name': 'Funny', 'count': 968}, {'i..."
1,6,Craig Venter,Sampling the ocean's DNA,http://www.ted.com/talks/view/id/6,Genomics pioneer Craig Venter takes a break fr...,http://www.ted.com/talks/view/id/6/transcript?...,7,2005,TEDGlobal 2005,0:16:51,2004/05/07,"biotech,invention,oceans,genetics,DNA,biology,...",https://www.ted.com/talks/craig_venter_on_dna_...,2004-05-07,562625,"[{'id': 3, 'name': 'Courageous', 'count': 21},..."
2,4,Burt Rutan,The real future of space exploration,http://www.ted.com/talks/view/id/4,"In this passionate talk, legendary spacecraft ...",http://www.ted.com/talks/view/id/4/transcript?...,2,2006,TED2006,0:19:37,10/25/06,"aircraft,flight,industrial design,NASA,rocket ...",https://www.ted.com/talks/burt_rutan_sees_the_...,2006-10-25,2046869,"[{'id': 3, 'name': 'Courageous', 'count': 169}..."
3,3,Ashraf Ghani,How to rebuild a broken state,http://www.ted.com/talks/view/id/3,Ashraf Ghani's passionate and powerful 10-minu...,http://www.ted.com/talks/view/id/3/transcript?...,7,2005,TEDGlobal 2005,0:18:45,10/18/06,"corruption,poverty,economics,investment,milita...",https://www.ted.com/talks/ashraf_ghani_on_rebu...,2006-10-18,814554,"[{'id': 3, 'name': 'Courageous', 'count': 140}..."
4,5,Chris Bangle,Great cars are great art,http://www.ted.com/talks/view/id/5,American designer Chris Bangle explains his ph...,http://www.ted.com/talks/view/id/5/transcript?...,2,2002,TED2002,0:20:04,2004/05/07,"cars,industrial design,transportation,inventio...",https://www.ted.com/talks/chris_bangle_says_gr...,2004-05-07,870950,"[{'id': 1, 'name': 'Beautiful', 'count': 89}, ..."


In [44]:
#| echo: false
import json
with open("../data/question_json_data.json", "r") as file:
    questions=json.load(file)
#display_quiz(questions)
questions

[{'question': "What is the data type of values in the last column (named 'rates') of the above dataset on ted talks",
  'type': 'multiple_choice',
  'answers': [{'answer': 'list', 'correct': True, 'feedback': 'Correct!'},
   {'answer': 'string',
    'correct': False,
    'feedback': 'Incorrect. Use the type function on the variable to find its datatype.'},
   {'answer': 'numeric',
    'correct': False,
    'feedback': 'Incorrect. Use the type function on the variable to find its datatype.'},
   {'answer': 'dictionary',
    'correct': False,
    'feedback': 'Incorrect. Use the type function on the variable to find its datatype.'}]}]

This JSON data contains nested structures, such as lists and dictionaries, which require a deeper understanding to effectively structure. We will address this in future lectures

### Practice exercise 6

Read the movies dataset from [here](https://raw.githubusercontent.com/vega/vega-datasets/master/data/movies.json). How many rows and columns are there in the data?

In [48]:
movies_data = pd.read_json('https://raw.githubusercontent.com/vega/vega-datasets/master/data/movies.json')
print("Number of rows =",movies_data.shape[0], "and number of columns=",movies_data.shape[1])

Number of rows = 3201 and number of columns= 16


###  Reading data from a URL in Python

This process typically involves using the `requests` library, which allows you to send HTTP requests and handle responses easily.

You'll need to install it using `pip`

We'll use the CoinGecko API, which provides cryptocurrency market data. Here’s an example of how to retrieve current market data:

In [54]:
import requests

# Define the URL of the API
url = 'https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd'

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the JSON data
    data = response.json()
    print(data)
else:
    print(f"Failed to retrieve data: {response.status_code}")


[{'id': 'bitcoin', 'symbol': 'btc', 'name': 'Bitcoin', 'image': 'https://coin-images.coingecko.com/coins/images/1/large/bitcoin.png?1696501400', 'current_price': 62490, 'market_cap': 1235032675967, 'market_cap_rank': 1, 'fully_diluted_valuation': 1312231173412, 'total_volume': 34554624888, 'high_24h': 64500, 'low_24h': 62100, 'price_change_24h': -1170.466984852057, 'price_change_percentage_24h': -1.83862, 'market_cap_change_24h': -23150749922.800293, 'market_cap_change_percentage_24h': -1.84001, 'circulating_supply': 19764571.0, 'total_supply': 21000000.0, 'max_supply': 21000000.0, 'ath': 73738, 'ath_change_percentage': -15.21931, 'ath_date': '2024-03-14T07:10:36.635Z', 'atl': 67.81, 'atl_change_percentage': 92093.56211, 'atl_date': '2013-07-06T00:00:00.000Z', 'roi': None, 'last_updated': '2024-10-08T00:54:10.271Z'}, {'id': 'ethereum', 'symbol': 'eth', 'name': 'Ethereum', 'image': 'https://coin-images.coingecko.com/coins/images/279/large/ethereum.png?1696501628', 'current_price': 2434.

In [53]:
# Loop through the data and print the name and current price
for coin in data:
    name = coin['name']
    price = coin['current_price']
    print(f"Coin: {name}, Price: ${price}")


Coin: Bitcoin, Price: $62490
Coin: Ethereum, Price: $2434.1
Coin: Tether, Price: $0.999711
Coin: BNB, Price: $568.59
Coin: Solana, Price: $144.62
Coin: USDC, Price: $0.99982
Coin: XRP, Price: $0.531761
Coin: Lido Staked Ether, Price: $2433.54
Coin: Dogecoin, Price: $0.109138
Coin: TRON, Price: $0.156189
Coin: Toncoin, Price: $5.23
Coin: Cardano, Price: $0.35311
Coin: Avalanche, Price: $26.86
Coin: Shiba Inu, Price: $1.759e-05
Coin: Wrapped stETH, Price: $2870.49
Coin: Wrapped Bitcoin, Price: $62339
Coin: WETH, Price: $2434.35
Coin: Chainlink, Price: $11.22
Coin: Bitcoin Cash, Price: $325.66
Coin: Polkadot, Price: $4.15
Coin: Dai, Price: $0.999811
Coin: Sui, Price: $2.06
Coin: NEAR Protocol, Price: $5.11
Coin: LEO Token, Price: $6.01
Coin: Uniswap, Price: $7.24
Coin: Litecoin, Price: $65.04
Coin: Bittensor, Price: $617.07
Coin: Aptos, Price: $8.91
Coin: Pepe, Price: $9.89e-06
Coin: Wrapped eETH, Price: $2554.88
Coin: Artificial Superintelligence Alliance, Price: $1.49
Coin: Internet Com