
## Overview

This notebook shows how to read and write the data in pyspark via Dataframe API using different ways and options. 

#### **Contents :**
- Reading the file via Dataframe API
- Read Modes in PySpark
- Way 1 : Reading file via `DataFrameReader.format().load()` method 
- Way 2 : Reading file via `DataFrameReader.csv()` method 
- Read Multiple Files in PySpark
- Reading CSV File Options 
- Writing CSV File Options
- Save Modes in PySaprk
- Way 1 : Writing file via `DataframeWriter.write().csv()` method 
- Way 2 : Writing file via `DataframeWriter.write().format().save()` method 
- Save Dataframes into Persistent Tables 
- Generic File Source Options

This is a **Python** notebook so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` magic command. `Python`, `Scala(%scala)`, `SQL(%sql)`, `FileStore(%fs)` and `R(%r)` all are supported.

**Input CSV File Used :**
- https://github.com/databricks/Spark-The-Definitive-Guide/blob/master/data/flight-data/csv/2010-summary.csv

**Spark Read/Write Documentation Link**
- https://spark.apache.org/docs/latest/sql-data-sources.html

#### Reading the file via Dataframe API

We can load the data in spark via `DataframeReader API` and can also write it via `DataframeWriter API`. To access DataframeReaderAPI we have to use `spark.read()` method. 

By utilizing `DataFrameReader.csv("path")` or `DataFrameReader.format("csv").load("path")` methods, we can read a CSV file into a PySpark DataFrame. These methods accept a file path as their parameter.

Parameters required while reading any file from `spark.read()` :
1. **format (optional) :** Data file format such as csv, json, jdbc/odbc, table *(default format is parquet)*
2. **option (optional) :** To set up different options for file reading such as *inferschema, mode, header, path*
3. **schema (optional) :** To pass manual schema
4. **load (required) :** Path where our data is residing 

#### Read Modes in PySpark 
In PySpark, when reading data into a DataFrame from external sources, you can specify a reading mode to control how the system should handle issues such as missing files, corrupt records, and schema mismatches. The available reading modes depend on the data source.

| Mode  | Meaning |
| ------------- | ------------- |
| **FAILFAST**  | This mode fails the reading process if it encounters any malformed/corrupted data or schema mismatch. |
| **DROPMALFORMED**  | This mode drops any row that contains malformed data (e.g., extra columns). |
| **PARMISSVE**  | This mode is the default mode while reading the dataframe. In permissive mode, PySpark reads as much data as possible and stores corrupt records in a `“_corrupt_record”` column ans set the `null value` to all the corrupted fields. |

#### Way 1 : Reading file via `DataFrameReader.format().load()` method 

In [0]:
# File location and type
file_location = "/FileStore/tables/2010_summary.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

# Display the result dataframe 
display(df.head(5))
display(df.printSchema())

DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24
Equatorial Guinea,United States,1


root
 |-- DEST_COUNTRY_NAME: string (nullable = true)
 |-- ORIGIN_COUNTRY_NAME: string (nullable = true)
 |-- count: string (nullable = true)



#### Way 2 : Reading file via `DataFrameReader.csv()` method 

In [0]:
# Read CSV File
df_read = spark.read.csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_read.head(5))
display(df_read.printSchema())

_c0,_c1,_c2
DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24


root
 |-- _c0: string (nullable = true)
 |-- _c1: string (nullable = true)
 |-- _c2: string (nullable = true)



#### Read Multiple Files in PySpark

In [0]:
df_read = spark.read.option("inferSchema", True) \
                .option("delimiter", ",") \
                .option("header", True) \
                .csv(['/FileStore/tables/2010_summary.csv', '/FileStore/tables/2010_summary_write.csv'])

## OR ##

df_read = spark.read.format('csv') \
  .option("inferSchema", True) \
  .option("header", True) \
  .option("sep", ',') \
  .load(["/FileStore/tables/2010_summary_write.csv", "/FileStore/tables/2010_summary.csv"])

# Display the result dataframe 
display(df_read.head(5))

#### Reading CSV File Options 
PySpark CSV dataset provides multiple options to work with CSV files. Below are some of the most important options explained with examples. We can either chain `option()` to use multiple options or use the alternate `options()` method.

**Syntax**
- option(self, key, value) # Using single options
- options(self, **options) # Using multiple options

###### 1. Header Option
If we have a header with column names on our input file, we need to explicitly specify True for header option using option`("header",True)` not mentioning this, the API treats header as a data record.

This option is used to read the first line of the CSV file as column names. By default the value of this option is `False` , and all column types are assumed to be a `string`.

In [0]:
# Read CSV File
df_read = spark.read.option('header', True).csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_read.head(5))
display(df_read.printSchema())

DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24
Equatorial Guinea,United States,1


root
 |-- DEST_COUNTRY_NAME: string (nullable = true)
 |-- ORIGIN_COUNTRY_NAME: string (nullable = true)
 |-- count: string (nullable = true)



###### 2. Delimiter Option
`delimiter` option is used to specify the column delimiter of the CSV file. By default, it is **comma (,)** character, but can be set to any character like **pipe(|)**, **tab (\t)**, **space** using this option.

*Quotes Option* : When we have a column with a delimiter that used to split the columns, use quotes option to specify the quote character, by default it is ” and delimiters inside quotes are ignored. but using this option you can set any character.

In [0]:
# Read CSV File
df_read = spark.read\
    .option('delimiter', ',')\
    .option('header', True)\
    .csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_read.head(5))
display(df_read.printSchema())

DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24
Equatorial Guinea,United States,1


root
 |-- DEST_COUNTRY_NAME: string (nullable = true)
 |-- ORIGIN_COUNTRY_NAME: string (nullable = true)
 |-- count: string (nullable = true)



###### 3. InferSchema Option
The default value set to this option is `False` when setting to `True` it automatically infers column types based on the data. Note that, it requires reading the data one more time to infer the schema.

In [0]:
# Read CSV File using inferschema and delimiter
df_read = spark.read.options(inferSchema='True',delimiter=',')\
  .csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_read.head(5))
display(df_read.printSchema())

_c0,_c1,_c2
DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24


root
 |-- _c0: string (nullable = true)
 |-- _c1: string (nullable = true)
 |-- _c2: string (nullable = true)



In [0]:
# Define read options
options = {
    "inferSchema": "True",
    "delimiter": ","
}

# Read a CSV file with specified options
df_read = spark.read.options(**options).csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_read.head(5))
display(df_read.printSchema())

_c0,_c1,_c2
DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24


root
 |-- _c0: string (nullable = true)
 |-- _c1: string (nullable = true)
 |-- _c2: string (nullable = true)



In [0]:
# Read a CSV file with chaining multiple options
df_read = spark.read.option("inferSchema",True) \
                .option("delimiter",",") \
                .csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_read.head(5))
display(df_read.printSchema())

_c0,_c1,_c2
DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24


root
 |-- _c0: string (nullable = true)
 |-- _c1: string (nullable = true)
 |-- _c2: string (nullable = true)



###### 4. Specify Custom Schema Option
Reading CSV files with a user-specified custom schema in PySpark involves defining the schema explicitly before loading the data. You can define the schema for the CSV file by specifying the column names and data types using the `StructType` and `StructField` classes. These are from the `pyspark.sql.types` module.

Using a user-specified custom schema provides flexibility in handling CSV files with specific data types or column names, ensuring that the DataFrame accurately represents the data according to the user’s requirements.

In [0]:
# Imports
from pyspark.sql.types import StructType, StructField, StringType, IntegerType 
from pyspark.sql.types import ArrayType, DoubleType, BooleanType

# Using custom schema
schema = StructType() \
      .add("DEST_COUNTRY_NAME",StringType(),True) \
      .add("ORIGIN_COUNTRY_NAME",StringType(),True) \
      .add("count",IntegerType(),True) 
      
df_with_schema = spark.read.format("csv") \
      .option("header", True) \
      .schema(schema) \
      .load("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_with_schema.head(5))
display(df_with_schema.printSchema())

DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24
Equatorial Guinea,United States,1


root
 |-- DEST_COUNTRY_NAME: string (nullable = true)
 |-- ORIGIN_COUNTRY_NAME: string (nullable = true)
 |-- count: integer (nullable = true)



#### Writing CSV File Options 
When writing a DataFrame to a CSV file in PySpark, you can specify various options to customize the output. These options can be set using the option() method of the DataFrameWriter class. Here’s how to use write options with a CSV file:

1. **header:** Specifies whether to include a header row with column names in the CSV file. Example: option("header", "true").
2. **delimiter:** Specifies the delimiter to use between fields in the CSV file. Example: option("delimiter", ",").
3. **quote:** Specifies the character used for quoting fields in the CSV file. Example: option("quote", "\"").
4. **escape:** Specifies the escape character used in the CSV file. Example: option("escape", "\\").
5. **nullValue:** Specifies the string to represent null values in the CSV file. Example: option("nullValue", "NA").
6. **dateFormat:** Specifies the date format to use for date columns. Example: option("dateFormat", "yyyy-MM-dd").
7. **mode:** Specifies the write mode for the output. Options include “overwrite”, “append”, “ignore”, and “error”. Example: option("mode", "overwrite").
8. **compression:** Specifies the compression codec to use for the output file. Example: option("compression", "gzip"). This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate). CSV built-in functions ignore this option.


#### Save Modes in PySpark

Save operations can optionally take a SaveMode, that specifies how to handle existing data if present. It is important to realize that these save modes do not utilize any locking and are not atomic. Additionally, when performing an Overwrite, the data will be deleted before writing out the new data.

We can specify different saving modes while writing PySpark DataFrame to disk. These saving modes specify how to write a file to disk.

| Mode  | Meaning |
| ------------- | ------------- |
| **overwrite**  | overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.  |
| **append**  | append mode means when saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data.  |
| **ignore**  | ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected not to save the contents of the DataFrame and not to change the existing data. This is similar to a `CREATE TABLE IF NOT EXISTS in SQL`  |
| **error**  | error mode means when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown. This is a `default` option. |

#### Way 1 : Writing file via `DataframeWriter.write().csv()` method 
To write a PySpark DataFrame to a CSV file, you can use the `write.csv()` method provided by the `DataFrame API`. This method takes a path as an argument, where the CSV file will be saved.

In [0]:
# Read a CSV file with chaining multiple options
df_read = spark.read.option("inferSchema", True) \
                .option("delimiter", ",") \
                .option("header", True) \
                .csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
display(df_read.head(5))
display(df_read.printSchema())

# Save and Write DataFrame to CSV File via write().csv() method 
df_read.write.option("header",True) \
               .mode('ignore') \
               .csv("/FileStore/tables/2010_summary_write")

DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1
United States,Ireland,264
United States,India,69
Egypt,United States,24
Equatorial Guinea,United States,1


root
 |-- DEST_COUNTRY_NAME: string (nullable = true)
 |-- ORIGIN_COUNTRY_NAME: string (nullable = true)
 |-- count: integer (nullable = true)



#### Way 2 : Writing file via `DataframeWriter.write().format().save()` method 
To write a PySpark DataFrame to a CSV file, you can use the `format().save()` method provided by the `DataFrame API`. This method takes a path as an argument, where the CSV file will be saved.

In [0]:
# Read a CSV file with chaining multiple options
df_read = spark.read.option("inferSchema", True) \
                .option("delimiter", ",") \
                .option("header", True) \
                .csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
df_less = df_read.where(df_read.ORIGIN_COUNTRY_NAME=='India')
display(df_less)

# Save and Write DataFrame to CSV File via format().save() method
df_less.write.format("csv").option("header",True) \
                .mode('overwrite') \
                .save("/FileStore/tables/2010_summary_write.csv")

DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,India,69


In [0]:
%fs
ls FileStore/tables/

path,name,size,modificationTime
dbfs:/FileStore/tables/2010_summary.csv,2010_summary.csv,7121,1728547018000
dbfs:/FileStore/tables/2010_summary_write/,2010_summary_write/,0,0
dbfs:/FileStore/tables/2010_summary_write.csv/,2010_summary_write.csv/,0,0
dbfs:/FileStore/tables/2010_summary_write_02/,2010_summary_write_02/,0,0
dbfs:/FileStore/tables/NewFile/,NewFile/,0,0
dbfs:/FileStore/tables/RangeFile/,RangeFile/,0,0
dbfs:/FileStore/tables/RangeText/,RangeText/,0,0
dbfs:/FileStore/tables/RangeText.txt/,RangeText.txt/,0,0
dbfs:/FileStore/tables/SparkText,SparkText,1761,1728552324000
dbfs:/FileStore/tables/SparkText.txt,SparkText.txt,513,1728828118000


#### Save Dataframes to Persistent Table 

- DataFrames can also be saved as persistent tables into Hive metastore using the `saveAsTable` command. Spark will create a default local Hive metastore (using Derby) for you. `saveAsTable` will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore. 

- Persistent tables will still exist even after your Spark program has restarted, as long as you maintain your connection to the same metastore. A DataFrame for a persistent table can be created by calling the table method on a SparkSession with the name of the table.

- For file-based data source, e.g. text, parquet, json, etc. you can specify a custom table path via the path option, e.g. `df.write.option("path", "/some/path").saveAsTable("t")`. When the table is dropped, the custom table path will not be removed and the table data is still there. If no custom table path is specified, Spark will write data to a default table path under the warehouse directory. When the table is dropped, the default table path will be removed too.

- **Bucketing, Sorting and Partitioning :** For file-based data source, it is also possible to bucket and sort or partition the output. Bucketing and sorting are applicable only to persistent tables

In [0]:
df_read = spark.read.option("inferSchema", True) \
                .option("delimiter", ",") \
                .option("header", True) \
                .csv("/FileStore/tables/2010_summary.csv")

# Display the result dataframe 
df_less = df_read.where(df_read.ORIGIN_COUNTRY_NAME=='Romania')
display(df_less)

(df_less.write.mode('append')
    # .partitionBy("favorite_color")
    # .bucketBy(1, "ORIGIN_COUNTRY_NAME")
    # .sortBy('count')
    .saveAsTable("2010_Romania"))
    

DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,1


#### Generic File Source Options

These generic options/configurations are effective only when using file-based sources: parquet, orc, avro, json, csv, text.

##### 1. Ignore Corrupt Files
Spark allows you to use the configuration spark.sql.files.ignoreCorruptFiles or the data source option ignoreCorruptFiles to ignore corrupt files while reading data from files. When set to true, the Spark jobs will continue to run when encountering corrupted files and the contents that have been read will still be returned.

In [0]:
# enable ignore corrupt files via the data source option
test_corrupt_df0 = spark.read.option("ignoreCorruptFiles", "true") \
                             .csv("/FileStore/tables/")
    
display(test_corrupt_df0.show())

# enable ignore corrupt files via the configuration
spark.sql("set spark.sql.files.ignoreCorruptFiles=true")

test_corrupt_df1 = spark.read.csv("/FileStore/tables/")

display(test_corrupt_df1.show())

+--------------------+-------------------+-----+
|                 _c0|                _c1|  _c2|
+--------------------+-------------------+-----+
|   DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|count|
|       United States|            Romania|    1|
|       United States|            Ireland|  264|
|       United States|              India|   69|
|               Egypt|      United States|   24|
|   Equatorial Guinea|      United States|    1|
|       United States|          Singapore|   25|
|       United States|            Grenada|   54|
|          Costa Rica|      United States|  477|
|             Senegal|      United States|   29|
|       United States|   Marshall Islands|   44|
|              Guyana|      United States|   17|
|       United States|       Sint Maarten|   53|
|               Malta|      United States|    1|
|             Bolivia|      United States|   46|
|            Anguilla|      United States|   21|
|Turks and Caicos ...|      United States|  136|
|       United State

##### 2. Ignore Missing Files
Spark allows you to use the configuration spark.sql.files.ignoreMissingFiles or the data source option ignoreMissingFiles to ignore missing files while reading data from files. Here, missing file really means the deleted file under directory after you construct the DataFrame. When set to true, the Spark jobs will continue to run when encountering missing files and the contents that have been read will still be returned.

##### 3. Path Glob Filter
pathGlobFilter is used to only include files with file names matching the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. It does not change the behavior of partition discovery.

In [0]:
df = spark.read.load("/FileStore/tables", format="csv", pathGlobFilter="*.csv")
df.show()

+--------------------+-------------------+-----+
|                 _c0|                _c1|  _c2|
+--------------------+-------------------+-----+
|   DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|count|
|       United States|            Romania|    1|
|       United States|            Ireland|  264|
|       United States|              India|   69|
|               Egypt|      United States|   24|
|   Equatorial Guinea|      United States|    1|
|       United States|          Singapore|   25|
|       United States|            Grenada|   54|
|          Costa Rica|      United States|  477|
|             Senegal|      United States|   29|
|       United States|   Marshall Islands|   44|
|              Guyana|      United States|   17|
|       United States|       Sint Maarten|   53|
|               Malta|      United States|    1|
|             Bolivia|      United States|   46|
|            Anguilla|      United States|   21|
|Turks and Caicos ...|      United States|  136|
|       United State

##### 4. Recursive File Lookup
recursiveFileLookup is used to recursively load files and it disables partition inferring. Its default value is false. If data source explicitly specifies the partitionSpec when recursiveFileLookup is true, exception will be thrown.

In [0]:
recursive_loaded_df = spark.read.format("csv")\
    .option("recursiveFileLookup", "true")\
    .load("/FileStore/tables")
    
recursive_loaded_df.show()

+--------------------+-------------------+-----+
|                 _c0|                _c1|  _c2|
+--------------------+-------------------+-----+
|   DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|count|
|       United States|            Romania|    1|
|       United States|            Ireland|  264|
|       United States|              India|   69|
|               Egypt|      United States|   24|
|   Equatorial Guinea|      United States|    1|
|       United States|          Singapore|   25|
|       United States|            Grenada|   54|
|          Costa Rica|      United States|  477|
|             Senegal|      United States|   29|
|       United States|   Marshall Islands|   44|
|              Guyana|      United States|   17|
|       United States|       Sint Maarten|   53|
|               Malta|      United States|    1|
|             Bolivia|      United States|   46|
|            Anguilla|      United States|   21|
|Turks and Caicos ...|      United States|  136|
|       United State

##### 5. Modified Time Path Filter
modifiedBefore and modifiedAfter are options that can be applied together or separately in order to achieve greater granularity over which files may load during a Spark batch query. (Note that Structured Streaming file sources don’t support these options.) When a timezone option is not provided, the timestamps will be interpreted according to the Spark session timezone (spark.sql.session.timeZone).
- **modifiedBefore:** an optional timestamp to only include files with modification times occurring before the specified time. The provided timestamp must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00)
- **modifiedAfter:** an optional timestamp to only include files with modification times occurring after the specified time. The provided timestamp must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00)

In [0]:
# Only load files modified before 07/1/2050 @ 08:30:00
df_before = spark.read.load("/FileStore/tables",
                     format="csv", modifiedBefore="2024-10-14T05:30:00")
df_before.show()

# Only load files modified after 06/01/2050 @ 08:30:00
df_after = spark.read.load("/FileStore/tables",
                     format="csv", inferSchema=True, modifiedAfter="2024-10-14T08:30:00")
df_after.show()

+--------------------+-------------------+-----+
|                 _c0|                _c1|  _c2|
+--------------------+-------------------+-----+
|   DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|count|
|       United States|            Romania|    1|
|       United States|            Ireland|  264|
|       United States|              India|   69|
|               Egypt|      United States|   24|
|   Equatorial Guinea|      United States|    1|
|       United States|          Singapore|   25|
|       United States|            Grenada|   54|
|          Costa Rica|      United States|  477|
|             Senegal|      United States|   29|
|       United States|   Marshall Islands|   44|
|              Guyana|      United States|   17|
|       United States|       Sint Maarten|   53|
|               Malta|      United States|    1|
|             Bolivia|      United States|   46|
|            Anguilla|      United States|   21|
|Turks and Caicos ...|      United States|  136|
|       United State

[0;31m---------------------------------------------------------------------------[0m
[0;31mAnalysisException[0m                         Traceback (most recent call last)
File [0;32m<command-3481412086443986>:7[0m
[1;32m      4[0m df_before[38;5;241m.[39mshow()
[1;32m      6[0m [38;5;66;03m# Only load files modified after 06/01/2050 @ 08:30:00[39;00m
[0;32m----> 7[0m df_after [38;5;241m=[39m spark[38;5;241m.[39mread[38;5;241m.[39mload([38;5;124m"[39m[38;5;124m/FileStore/tables[39m[38;5;124m"[39m,
[1;32m      8[0m                      [38;5;28mformat[39m[38;5;241m=[39m[38;5;124m"[39m[38;5;124mcsv[39m[38;5;124m"[39m, inferSchema[38;5;241m=[39m[38;5;28;01mTrue[39;00m, modifiedAfter[38;5;241m=[39m[38;5;124m"[39m[38;5;124m2024-10-14T08:30:00[39m[38;5;124m"[39m)
[1;32m      9[0m df_after[38;5;241m.[39mshow()

File [0;32m/databricks/spark/python/pyspark/instrumentation_utils.py:48[0m, in [0;36m_wrap_function.<locals>.wrapper[0;34m(*ar